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Abstract — Sveral aspects of the problem of asynchronous point- 
to-point communication without feedback are developed when the 
source is highly intermittent. In the system model of interest, the 
codeword is transmitted at a random time within a prescribed 
window whose length corresponds to the level of asynchronism 
between the transmitter and the receiver. The decoder operates 
sequentially and communication rate is defined as the ratio 
between the message size and the elapsed time between when 
transmission commences and when the decoder makes a decision. 

For such systems, general upper and lower bounds on capacity 
as a function of the level of asynchronism are established, 
and are shown to coincide in some nontrivial cases. From 
these bounds, several properties of this asynchronous capacity 
are derived. In addition, the performance of training-based 
schemes is investigated. It is shown that such schemes, which 
implement synchronization and information transmission on 
separate degrees of freedom in the encoding, cannot achieve 
the asynchronous capacity in general, and that the penalty is 
particularly significant in the high-rate regime. 

Index Terms — asynchronous communication; bursty commu- 
nication; error exponents; sequential decoding; sparse commu- 
nication; synchronization 



I. Introduction 

INFORMATION-THEORETIC analysis of communication 
systems frequently ignores synchronization issues. In many 
applications where large amounts of data are to be transmitted, 
such simplifications may be justified. Simply prepending a 
suitable synchronization preamble to the initial data incurs 
negligible overhead yet ensures that the transmitter and the 
receiver are synchronized. In turn, various coding techniques 
(e.g., graph based codes, polar codes) may guarantee delay 
optimal communication for data transmission in the sense that 
they can achieve the capacity of the synchronous channel. 

In quantifying the impact due to a lack of synchronization 
between a transmitter and a receiver, it is important to note 
that asynchronism is a relative notion that depends on the size 
of the data to be transmitted. For instance, in the above "low 
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asynchronism" setting it is implicitly assumed that the data is 
large with respect to the timing uncertainty. 

In a growing number of applications, such as many involv- 
ing sensor networks, data is transmitted in a bursty manner 
An example would be a sensor in a monitoring system. By 
contrast with the previous setting, here timing uncertainty is 
large with respect to the data to be transmitted. 

To communicate in such "high asynchronism" regimes, one 
can use the traditional preamble based communication scheme 
for each block. Alternatively, one can pursue a fundamentally 
different strategy in which synchronization is integrated into 
the encoding of the data, rather than separated from it. 

To evaluate the relative merits of such diverse strategies, and 
more generally to explore fundamental performance limits, we 
recently introduced a general information-theoretic model for 
asynchronous communication in |3 1. This model extends Shan- 
non's original communication model [4| to include asynchro- 
nism. In this model, the message is encoded into a codeword 
of fixed length, and this codeword starts being sent across a 
discrete memoryless channel at a time instant that is randomly 
and uniformly distributed over some predefined transmission 
window. The size of this window is known to transmitter 
and receiver, and the level of asynchronism in the system 
is governed by the size of the window with respect to the 
codeword length. Outside the information transmission period, 
whose duration equals the codeword length, the transmitter 
remains idle and the receiver observes noise, i.e., random 
output symbols. The receiver uses a sequential decoder whose 
scope is twofold: decide when to decode and what message 
to declare. 

The performance measure is the communication rate which 
is defined as the ratio between the message size and the 
average delay between when transmission starts and when the 
message is decoded. Capacity is the supremum of achievable 
rates, i.e., rates for which vanishing error probability can be 
guaranteed in the limit of long codeword length. 

The scaling between the transmission window and the 
codeword length that meaningfully quantifies the level of 
asynchronism in the system turns out to be exponential, i.e., 
A — e"" where A denotes the size of the transmission 
window, where n denotes the codeword length, and where 
a denotes the asynchronism exponent. Indeed, as discussed 
in 1 3 1, if A scales subexponentially in n, then asynchronism 
doesn't impact communication: the asynchronous capacity 
is equal to the capacity of the synchronous channel. By 
contrast, if the window size scales superexponentially, then 
the asynchrony is generally catastrophic. Hence, exponential 
asynchronism is the interesting regime and we aim to compute 
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capacity as a function of the asynchronism exponent. 

For further motivation and background on the model, includ- 
ing a summary of related models (e.g., the insertion, deletion, 
and substitution channel model, and the detection and isolation 
model) we refer to (T Section II]. Accordingly, we omit such 
material from the present paper 

The first main result in (|3l is the characterization of the 
synchronization threshold, which is defined as the largest asyn- 
chronism exponent for which it is still possible to guarantee 
reliable communication — this result is recalled in Theorem [T] 
of Section HVl 

The second main result in |l3l (see |[3] Theorem 1]) is a lower 
bound to capacity. A main consequence of this bound is that 
for any rate below the capacity of the synchronous channel it 
is possible to accommodate a non-trivial asynchronism level, 
i.e., a positive asynchronism exponent. 

While this work focuses on rate, an alternative performance 
metric is the minimum energy (or, more generally, the min- 
imum cost) needed to transmit one bit of information asyn- 
chronously. For this metric, [5], (6\ establishes the capacity 
per unit cost for the above bursty communication setup. 

We now provide a brief summary of the results contained 
in this paper: 

• General capacity lower bound. Theorems |2] and |7] The- 
orem 12] provides a lower bound to capacity which is 
obtained by considering a coding scheme that performs 
synchronization and information transmission jointly. The 
derived bound results in a much simpler and often much 
better lower bound than the one obtained in |3, Theorem 
1]. Theorem 12] which holds for arbitrary discrete memo- 
ryless channels, also holds for a natural Gaussian setting, 
which yields Theorem [T] 

• General capacity upper bound. Theorem |5] This bound 
and the above lower bound, although not tight in general, 
provide interesting and surprising insights into the asyn- 
chronous capacity. For instance. Corollary |2] says that, in 
general, it is possible to reliably achieve a communication 
rate equal to the capacity of the synchronous channel 
while operating at a strictly positive asynchronism expo- 
nent. In other words, it is possible to accommodate both 
a high rate and an exponential asynchronism. 
Another insight is provided by Corollary |3] which relates 
to the very low rate communication regime. This result 
says that, in general, one needs to (sometimes signifi- 
cantly) back off from the synchronization threshold in 
order to be able to accommodate a positive rate. As a 
consequence, capacity as a function of the asynchronism 
exponent does not, in general, strictly increase as the 
latter decreases. 

• Capacity for channels with infinite synchronization 
threshold. Theorem H] For the class of channels for 
which there exists a particular channel input which can't 
be confused with noise, a closed-form expression for 
capacity is established. 

• Suboptimality of training based schemes. Theorem |6] 
Corollaries |?] and \5\ These results show that commu- 
nication strategies that separate synchronization from 
information transmission do not achieve the asynchronous 



capacity in general. 
• Good synchronous codes. Theorem \5\ This result may 
be independent interest and relates to synchronous com- 
munication. It says that any codebook that achieves a 
nontrivial error probability contains a large subcodebook, 
whose rate is almost the same as the rate of the original 
codebook, and whose error probability decays exponen- 
tially with the blocklength with a suitable decoder This 
result, which is a byproduct of our analysis, is a stronger 
version of ||7l Corollary 1.9, p. 107] and its proof amounts 
to a tightening of some of the arguments in the proof of 
the latter 

It is worth noting that most of our proof techniques differ 
in some significant respects from more traditional capacity 
analysis for synchronous communication — for example, we 
make little use of Fano's inequality for converse arguments. 
The reason for this is that there are decoding error events 
specific to asynchronous communication. One such event is 
when the decoder, unaware of the information transmission 
time, declares a message before transmission even starts. 

An outline of the paper is as follows. Section Hll summarizes 
some notational conventions and standard results we make use 
of throughout the paper Section |III] describes the communica- 
tion model of interest. Section |IV] contains our main results, 
and Section [V] is devoted to the proofs. Section |VT] contains 
some concluding remarks. 

II. Notation and Preliminaries 

In general, we reserve upper case letters for random vari- 
ables (e.g., X) and lower case letters to denote their cor- 
responding sample values (e.g., x), though as is customary, 
we make a variety of exceptions. Any potential confusion is 
generally avoided by context. In addition, we use x-f to denote 
the sequence Xi,Xi+i, . . . ,Xj, for i < j. Moreover, when 
i — 1 we use the usual simpler notation a;" as an alternative 
to a;". Additionally, = denotes "equality by definition." 

Events (e.g., £) and sets (e.g., S) are denoted using cali- 
graphic fonts, and if £ represents an event, E"^ denotes its 
complement. As additional notation, P[-] and E[ ] denote the 
probability and expectation of their arguments, respectively, 
II • II denotes the Li norm of its argument, | • | denotes 
absolute value if its argument is numeric, or cardinality if its 
argument is a set, [-J denotes the integer part of its argument, 
a A b = min{a, b}, and x+ = max{0, x}. Furthermore, we 
use C to denote nonstrict set inclusion, and use the Kronecker 
notation 1 {A) for the function that takes value one if the event 
A is true and zero otherwise. 

We also make use of some familiar order notation for 
asymptotics (see, e.g., ||8] Chapter 3]). We use o(-) and w(-) 
to denote (positive or negative) quantities that grow strictly 
slower and strictly faster, respectively, than their arguments; 
e.g., o(l) denotes a vanishing term and n/hm ~ u!{y/n). We 
also use O(-) and defined analogously to o(-) and uj{-), 
respectively, but without the strictness constraint. Finally, we 
use poly(-) to denote a function that does not grow or decay 
faster than polynomially in its argument. 

We use P(-) to denote the probability of its argument, and 
use y-^, CP^, and CP^^^ to denote the set of distributions over 
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the finite alphabets X, and X x y respectively, and use 
yyix denote the set of conditional distributions of the form 
V{y\x) for [x,v) G X x y. 

For a memoryless channel characterized by channel law 
Q g J>^l^^ the probability of the output sequence y" € 
given an input sequence x" G X" is 

n 

Q{y^\xn = X{Q{y^\^^)■ 

i=l 

Throughout the paper, Q always refers to the underlying 
channel and C denotes its synchronous capacity. 

Additionally, we use ,]% and Jy to denote the left and right 
marginals, respectively, of the joint distribution J e J"-*"'^, i.e., 

Jx{x) ^^J{x,y) and J^{y) J{x,y). 

We define all information measures relative to the natural 
logarithm. Thus, the entropy associated with P £ T'^ ifl 

i/(P)^-^P(a;)lnP(x), 

and the conditional entropy associated with Q e y^l^^ and 

P e is 

H{Q\P) - - 5] p{x) J2 Qiy\^) In Qiy\^)- 

Similarly, the mutual information induced by J(-, •) e J'-*'"^ 
is 

J{x,y) 



{x,y)<£Xx)) 



Jx{x)J)j{y)' 



so 



/(PQ)^E^(^)EQ(2^I^)1^^ 



xGX ygy 



(^Q)y(y) 



for P G y-*" and G T^l-'". Furthermore, the information 
divergence (Kullback-Leibler distance) between Pi G 7''^ and 
P, G is 



and conditional information divergence is denoted using 

Diw, \mp) ^ E p(-) E ^1 iy\-) 

= i:i(PW^i||PW^2), 

where P G T''" and VFi , W2 G ' . As a specialized notation, 
we use 



-DB(ei||e2) = El In 



(l-ei)ln 



1-62 



to denote the divergence between Bernoulli distributions with 
parameters 61,62 G [0, 1]. 

'in the definition of all such infonnation measures, we use the usual 
convention 01n(0/0) = 0. 



We make frequent use of the method of types |[7] Chap- 
ter 1.2]. In particular, Pr^n denotes the empirical distribution 
(or type) of a sequence x" G X", i.e.0 



E 



i(x, 



x). 



The joint empirical distribution P(^x".y") for a sequence pair 
{x",y") is defined analogously, i.e.. 



1 " 

Px'^.y^{x,y) ^ - V l(a 



x,yi 



y), 



and, in turn, a sequence y" is said to have a conditional 
empirical distribution Py^^\x" G y^l^ given if for all 

(x,2/)GXxy, 

Px"-,y"{x,y) = Px,.{x) Py^ix^{y\x). 

As additional notation, P G J""*" is said to be an n-type 
if nP{x) is an integer for all x G X. The set of all n-types 
over an alphabet X is denoted using T^. The n-type class 
of P, denoted using Tp, is the set of all sequences that 
have type P, i.e., such that P^n = P. A set of sequences is 
said to have constant composition if they belong to the same 
type class. When clear from the context, we sometimes omit 
the superscript n and simply write Tp. For distributions on 
the alphabet X x y the set of joint n-types T^'^ is defined 
analogously. The set of sequences ?/" that have a conditional 
type W given a;" is denoted by 7w{x^^), and J'^'"*' denotes 
the set of empmcal conditional distributions, i.e., the set of 
W G T^'^ such that W = Pyr.\x-n{y\x) for some G 

X" X y". 

Finally, the following three standard type results are often 
used in our analysis. 

Fact 1 f/7 Lemma 1.2.2]): 

\T^\<{n+l)m 

Fact 2 ([^ Lemma L2.6]): If X" is independent and iden- 
tically distributed (i.i.d.) according to Pi G V'^ , then 



{n + i)m 

for any P2 G T^^. 

Fact 3 ([7, Lemma L2.6]): If the input G X" to a 
memoryless channel Q G T^l-"- has type P G T'^, then the 
probability of observing a channel output sequence which 
lies in 7w{x") satisfies 
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< ^-nD{W\\Q\P) 

for any W G y^l-*^ such that Jwix"') is non-empty. 

"When the sequence that induces the empiiical type is clear from context, 
we omit the subscript and wiite simply P. 



III. Model and Performance Criterion 

The asynchronous communication model of interest cap- 
tures the setting where infrequent delay-sensitive data must be 
reliably communicated. For a discussion of this model and its 
connections with related communication and statistical models 
we refer to [3, Section II]. 

We consider discrete-time communication without feedback 
over a discrete memoryless channel characterized by its finite 
input and output alphabets X and y, respectively, and transition 
probability matrix Q{y\x), for all ?/ G ^ and x G X. Without 
loss of generality, we assume that for all y £ y there is some 
a; G X for which Q{y\x) > 0. 

There are M > 2 messages m e {1, 2, . . . , M}. For each 
message to, there is an associated codeword 

c"(to) ^ ci(to) C2(to) • • • c„(m), 

which is a string of n symbols drawn from X. The M 
codewords form a codebook C„ (whence |C„| = M). Com- 
munication takes place as follows. The transmitter selects a 
message to randomly and uniformly over the message set and 
starts sending the corresponding codeword c"(to) at a random 
time unknown to the receiver, independent of c"(to), and 
uniformly distributed over {1,2,..., A}, where A = e"" is 
referred to as the asynchronism level of the channel, with a 
termed the associated asynchronism exponent. The transmitter 
and the receiver know the integer parameter A > 1. The 
special case A = \ (i.e., a = 0) corresponds to the classical 
synchronous communication scenario. 

When a codeword is transmitted, a noise-corrupted version 
of the codeword is obtained at the receiver When the transmit- 
ter is silent, the receiver observes only noise. To characterize 
the output distribution when no input is provided to the 
channel, we make use of a specially designated "no-input" 
symbol ★ in the input alphabet X, as depicted in Figs. [T] and |2] 
Specifically, 

g* = Q(-K) (1) 

characterizes the noise distribution of the channel. Hence, 
conditioned on the value of v and on the message to 
to be conveyed, the receiver observes independent symbols 
Yi,Y2, ■ . ■ , Ya+ti-i distributed as follows. If 



ie{l,2,...,i/-l} 



or 



t G [v -\- n^v 



1], 



the distribution of Yt is Qi,. If 

te{v,v + i,.. 



1}, 



the distribution of Yt is Q{-\ct-v+i{'m)). Note that since the 
transmitter can choose to be silent for arbitrary portions of its 
length-n transmission as part of its message-encoding strategy, 
the symbol ★ is eligible for use in the codebook design. 

The decoder takes the form of a sequential test (r, 0), where 
T is a stopping time, bounded by A + n — 1, with respect 
to the output sequence Yi, • • ■ . indicating when decoding 
happens, and where denotes a decision rule that declares 
the decoded message; see Fig. |2] Recall that a stopping time 




Fig. 1. Graphical depiction of the transmission matrix for an asynchronous 
discrete memoryless channel. The "no inpuf symbol * is used to characterize 
the channel output when the transmitter is silent. 



Fig. 2. Temporal representation of the channel input sequence (upper axis) 
and channel output sequence (lower axis). At time v message m starts being 
sent and decoding occurs at time r. Since u is unknown at the receiver, 
the decoding time may be before the entire codeword has been received, 
potentially (but not necessarily) resulting in a decoding eiTor. 



T (deterministic or randomized) is an integer-valued random 
variable with respect to a sequence of random variables 
so that the event {r = t], conditioned on 



is independent of {lij^j^i, for all t > 1. The function 
(f) is then defined as any 3^7- -measurable map taking values 
in {1, 2, . . . , A/}, where Ji,J2t-- is the natural filtration 
induced by the process Yi, I2, • • • ■ 

A code is an encoder/decoder pair (C, (r, 

The performance of a code operating over an asynchronous 
channel is quantified as follows. First, we define the maximum 
(over messages), time-averaged decoding error probabilit}0 



1 ^ 

P(£) - max- VP,„,t(£), 

rn J± ^ — ^ 



(2) 



where £ indicates the event that the decoded message does not 
correspond to the sent message, and where the subscripts to, t 
indicate the conditioning on the event that message to starts 
being sent at time v — t. Note that by definition we have 

Second, we define communication rate with respect to the 
average elapsed time between the time the codeword starts 
being sent and the time the decoder makes a decision, i.e., 

(3) 



where 



R ^ 



1 



A = max — > ] 
m A-^ 

t=i 



(4) 



'Note that the proposed asynchronous discrete-time communication model 
still assumes some degree of synchronization since transmitter and receiver 
are supposed to have access to clocks ticking at unison. This is sometimes 
referred to as frame asynchronous symbol synchronous communication. 

''Note that there is a small abuse of notation as P(£) need not be a 
probability. 
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where a;+ denotes max{0,a;}, and where E„i i denotes the 
expectation with respect to Pm,tEI 

With these definitions, the class of communication strategies 
of interest is as follows. 

Definition 1 ({R,a) Coding Scheme): A pair {R,a) with 
R > and a > is achievable if there exists a sequence 
{(C„, (t„, 0„)}„>i of codes, indexed by the codebook length 
71, that asymptotically achieves a rate R at an asynchronism 
exponent a. This means that for any e > and every n large 
enough, the code (e„, (t„, </>„)) 

1) operates under asynchronism level An — e'^""'^-'"; 

2) yields a rate at least equal to i? — e; 

3) achieves a maximum error probabiUty of at most e. 
An {R,a) coding scheme is a sequence {(C„, (r„, (/)„))}„>i 
that achieves the rate-exponent pair {R, a). 

In turn, capacity for our model is defined as follows. 

Definition 2 (Asynchronous Capacity): For given a > 0, 
the asynchronous capacity R{a) is the supremum of the set 
of rates that are achievable at asynchronism exponent a. 
Equivalently, the asynchronous capacity is characterized by 
a{R), defined as the supremum of the set of asynchronism 
exponents that are achievable at rate R>Q. 
Accordingly, we use the term "asynchronous capacity" to 
designate either R{a) or a{R). While R{a) may have the 
more natural immediate interpretation, most of our results are 
more conveniently expressed in terms of a{R). 

In agreement with our notational convention, the capacity of 
the synchronous channel, which corresponds to the case where 
a = 0, is simply denoted by C instead of i?(0). Throughout 
the paper we only consider channels with C > 0. 

Remark 1: One could alternatively consider the rate with 
respect to the duration the transmitter occupies the channel 
and define it with respect to the block length n. In this case 
capacity is a special case of the general asynchronous capacity 
per unit cost result fS" Theorem 1]. 

In UJ, [9] it is shown that reliable communication is possible 
if and only if the asynchronism exponent a does not exceed 
a limit referred to as the "synchronization threshold." 

Theorem 1 ( [3, Theorem 2], [QJ): If the asynchronism ex- 
ponent is strictly smaller than the synchronization threshold 

ao = m(ixD{Q{-\x)\\Q^) = a{R = 0), 

X 

then there exists a coding scheme {(C„, (r„, (/>„)) }„>i that 
achieves a maximum error probability tending to zero as n ^ 
oo. 

Conversely, any coding scheme {(C„, (t„, </)„))}„>! that 
operates at an asynchronism exponent strictly greater than the 
synchronization threshold, achieves (as n oo) a maximum 
probability of error equal to one. 

Moreover]^ 

Q!o > if and only if C > . 

A few comments are in order The cause of unreliable 
communication above the synchronization threshold is the 
following. When asynchronism is so large, with probability 

'Note that Em,t(Tn — t)+ should be interpreted as 'Em,t{iTn — t)^)- 
*This claim appeared in (3. p. 4515]. 



approaching one pure noise mimics a codeword for any 
codebook (regardless of the rate) before the actual codeword 
even starts being sentQ This results in an error probability of 
at least 1/2 since, by our model assumption, the message set 
contains at least two messages. On the other hand, below the 
synchronization threshold reliable communication is possible. 
If the codebook is properly chosen, the noise won't mimic any 
codeword with probability tending to one, which allows the 
decoder to reliably detect the sent message. 
Note that 

tto = 00 

if and only if pure noise can't generate all channel outputs, 
i.e., if and only if — for some y £ y. Indeed, in this 

case it is possible to avoid the previously mentioned decod- 
ing confusion by designing codewords (partly) composed of 
symbols that generate channel outputs which are impossible 
to generate with pure noise. 

The last claim in Theorem[T]says that reliable asynchronous 
communication is possible if and only if reliable synchronous 
communication is possible. That the former implies the latter 
is obvious since asynchronism can only hurt communication. 
That the latter implies the former is perhaps less obvious, and a 
high-level justification is as follows. When C > 0, at least two 
channel inputs yield different conditional output distributions, 
for otherwise the input-output mutual information is zero re- 
gardless of the input distribution. Hence, ^ Q{-\x) for 
some X ^ -k. Now, by designing codewords mainly composed 
of X it is possible to reliably signal the codeword's location to 
the decoder even under an exponential asynchronism, since the 
channel outputs look statistically different than noise during 
the message transmission. Moreover, if the message set is 
small enough, it is possible to guarantee reliable message 
location and successfully identify which message from the 
message set was sent. Therefore, exponential asynchronism 
can be accommodated, hence Oo > 0. 

Finally, it should be pointed out that in f3\ all the results 
are stated with respect to average (over messages) delay and 
error probability in place of maximum (over messages) delay 
and error probability as in this paper Nevertheless, the same 
results hold in the latter case as discussed briefly later at the 
end of Section [Vl 

IV. Main Results 

This section is divided into two parts. In Section IIV-AI 
we provide general upper and lower bounds on capacity, 
and derive several of its properties. In Section IIV-BI we 
investigate the performance limits of training-based schemes 
and establish their suboptimality in a certain communication 
regime. Since both sections can be read independently, the 
practically inclined reader may read Section HV-Bl first. 

All of our results assume a uniform distribution on u. 
Nevertheless, this assumption is not critical in our proofs. 
The results can be extended to non-uniform distributions by 
following the same arguments as those used to establish 

'This follows from the converse of |9 Theorem], which says that above 
ao, even the codeword of a single codeword codebook is mislocated with 
probability tending to one. 
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asynchronous capacity per unit cost for non-uniform v ||5] 
Theorem 5]. 

A. General Bounds on Asynchronous Capacity 

A decoder at the output of an asynchronous channel should 
discriminate between hypothesis "noise" and hypothesis "mes- 
sage," which correspond to the situations when the transmitter 
is idle and when it transmits a codeword, respectively. Intu- 
itively, the more these hypotheses are statistically far apart — 
by means of an appropriate codebook design — the larger the 
level of asynchronism which can be accommodated for a given 
communication rate. 

More specifically, a code should serve the dual purpose of 
minimizing the "false-alarm" and "miss" error probabilities. 

False-alarm refers to the event where the decoder outputs 
a message before a message is sent. As such, this event 
contributes to lower the rate — since it is defined with respect 
to the receiver's decoding delay E(t — v)^ — at the expense 
of the error probability. As an extreme case, by immediately 
decoding, i.e., by setting t = 1, we get an infinite rate and 
and error probability (asymptotically) equal to one. As it turns 
out, the false-alarm probability should be exponentially small 
to allow reliable communication under exponential asynchro- 
nism. 

The miss event refers to the scenario where the decoder 
fails to recognize the sent message during transmission, i.e., 
the message output looks like it was generated by noise. This 
event impacts the rate and, to a smaller extent, also the error 
probability. In fact, when the sent message is missed, the 
reaction delay is usually huge, of the order of A. Therefore, 
to guarantee a positive rate under exponential asynchronism 
the miss error probability should also be exponentially small. 

Theorem |2] below provides a lower bound on the asyn- 
chronous capacity. The proof of this theorem is obtained by 
analyzing a coding scheme which performs synchronization 
and information transmission jointly. The codebook is a stan- 
dard i.i.d. random code across time and messages and its 
performance is governed by the Chemoff error exponents for 
discriminating hypothesis "noise" from hypothesis "message." 

Theorem 2 (Lower Bound on Asynchronous Capacity): 
Let a > and let P e T"*" be some input distribution such 
that at least one of the following inequalities 

D{V\\{PQh)>a 

D{V\\Q.) > a 

holds for all distributions V , i.e., 

min ms.^{D{V\\{PQ)^),D{V\\Q^)} > a. 
yea"* 

Then, the rate-exponent pair {R = I{PQ),a) is achievable. 
Thus, maximizing over all possible input distributions, we have 
the following lower bound on a{R) in Definition |2] 

a{R)>a-iR) i? G (0, C] (5) 

where 

a-(R)= max min max{D{V\\ (PQ)y), D(V\\Q^)}. 
HPQ)>R} 

(6) 




Fig. 3. If a is at most tlie "half-distance" between distributions {PQ)y and 
Q*, then (a, R) with R = I{PQ) is achievable. 



The analysis of the coding scheme that yields Theorem |2] is 
actually tight in the sense that the coding scheme achieves (|6]l 
with equality (see proof of Theorem |2] and remark p. [14]) 

Theorem |2] provides a simple explicit lower bound on 
capacity. The distribution {PQ)y corresponds to the channel 
output when the input to the channel is distributed according to 
P. The asynchronism exponent that can be accommodated for 
given P and can be interpreted as being the "equidistant 
point" between distributions (-PQ)y and Q*, as depicted in 
Fig. |3] Maximizing over P such that I{PQ) > R gives 
the largest such exponent that can be achieved for rate R 
communication. 

Note that (|6]l is much simpler to evaluate than the lower 
bound given by |I3] Theorem 2]. Moreover, the former is 
usually a better bound than the latter and it exhibits an 
interesting feature of a{R) in the high rate regime. This feature 
is illustrated in Example [T] to come. 

Theorem |2] extends to the following continuous alphabet 
Gaussian setting: 

Corollary 1 (Asynchronous Gaussian channel): Suppose 
that for a real input x the decoder receives Y = x + Z, where 
Z ^ 3\f(0, 1). When there is no input to the channel, Y ^ Z, 
so Qi, = 3Nf(0, 1). The input is power constrained so that 
all codewords c"(m) must satisfy ^ X]r=i — P f^'" ^ 
given constant p > 0. For this channel we have 

a{R)> max mmmax{D{V\\{PQ)y), D{V\\Q^)}, 

EpX^<p 

(7) 

for R e (0, C] where P and V in the optimization are 
distributions over the reals. 

If we restrict the outer maximization in (|7) to be over Gaussian 
distributions only, it can be shown that the best input has a 
mean /i that is as large as possible, given the rate and power 
constraints. More precisely, /i and R satisfy 

R=^ln{l+p-^l^), 

and the variance of the optimal Gaussian input is p — /i^. The 
intuition for choosing such parameters is that a large mean 
helps the decoder to distinguish the codeword from noise — 
since the latter has a mean equal to zero. What limits the 
mean is both the power constraint and the variance needed to 
ensure sufficient mutual information to support communication 
at rate R. 
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Proof of CorollaryU] The proof uses a standard quantiza- 
tion argument similar to that in flO\, and therefore we provide 
only a sketch of the proof. From the given the continuous time 
Gaussian channel, we can form a discrete alphabet channel for 
which we can apply Theorem |2] 

More specifically, pick a discrete input distribution P that 
satisfies the power constraint. The output is discretized within 
[~L/2, L/2] into constant size A intervals so that L ^ oo 
as A ^ 0. The output of the quantized channel corresponds 
to the mid-value of the interval which contains the output of 
the Gaussian channel. If the output of the Gaussian channel 
falls bellow —L/2, the quantized value is set to be —L/2, 
and if the output of the Gaussian channel falls above L/2, the 
quantized value is set to be L/2. 

For each quantized channel we apply Theorem |2] then let 
delta tend to zero. One can then verify that the achieve bound 
corresponds to (|7]i, which shows that Theorem |2] also holds 
for the continuous alphabet Gaussian setting of Theorem [1] ■ 

The next result provides an upper bound to the asyn- 
chronous capacity for channels with finite synchronization 
threshold — see Theorem [1] 

Theorem 3 (Upper Bound on Asynchronous Capacity): 
For any channel Q such that cto < oo, and any i? > 0, we 
have that 

a{R) < maxminjai, 02} = ck+(i?), (8) 

where 

ai^S{I{PiQ)~R + DiiP,Qh\\Q,)) (9) 
a2= min imix{D{W\\Q\P2), DiW\\Q,\P2)} (10) 

with 

S^{(Fi,P2,Pi',<5)e (T^)'x [0,1] : 

HPiQ) >R, P2 = SPi + (1 - S)P[y (11) 
If tto — 00, then 

a(R) < maxa2 (12) 
P2 

for R e (0,C]. 

The terms ai and a2 in (H) reflect the false-alarm and 
miss constraints alluded to above (see discussion before The- 
orem |2]i. If a > ai, then with high probability the noise will 
mimic a message before transmission starts. Instead, if a > Q!2 
then reliable communication at a positive rate is impossible 
since no code can guarantee a sufficiently low probability of 
missing the sent codeword. 

The parameter 5 in (|9]l and (fTTT i essentially represents 
the ratio between the reaction delay E(t — v)'^ and the 
blocklength — which need not coincide. Loosely speaking, for 
a given asynchronism level a smaller 6, or, equivalently, a 
smaller E(t — 1^)+, increases the communication rate at the 
expense of a higher false-alarm error probability. The intuition 
for this is that a decoder that achieves a smaller reaction delay 
sees, on average, "fewer" channel outputs before stopping. 
As a consequence, the noise is more likely to lead such 
a decoder into confusion. A similar tension arises between 




Fig. 4. A channel for which ct{R) is discontinuous aX R = C. 
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Fig. 5. Capacity upper and lower bounds on the asynchronous capacity of 
the channel of Fig. |4] with e = 0.1 and * = 0. a-(R.) represents the lower 
bound given by Theorem |2] LB [3] represents the lower bound obtained in (3] 
Theorem 1], and Q+(i?) represents the upper bound given by Theorem [3] 



communication rate and the miss error probability. The opti- 
mization over the set S attempts to strike the optimal tradeoff 
between the communication rate, the false-alarm and miss 
error probabilities, as well as the reaction delay as a fraction 
of the codeword length. 

For channels with infinite synchronization threshold. The- 
orem |4] to come establishes that the bound given by ( fT2] i is 
actually tight. 

The following examples provide some useful insights. 

Example 1: Consider the binary symmetric channel de- 
picted in Fig. m which has the property that when no input is 
supplied to the channel, the output distribution is asymmetric. 
For this channel, in Fig. |5] we plot the lower bound on a{R) 
given by ^ (curve and the lower bound given by 

im Theorem 1] (the dashed line LB[3])0 The a+{R) curve 
correspond to the upper bound on a{R) given by Theorem |3] 
For these plots, the channel parameter is e = 0.1. 

The discontinuity of a{R) at R — C (since a{R) is clearly 
equal to zero for R > C) implies that we do not need to back 
off from the synchronous capacity in order to operate under 



*Due to the complexity of evaluating the lower bound given by |3] 
Theorem 1], the curves labeled LB [3] are actually upper bounds on this lower 
bound. We believe these bounds are fairly tight, but in any case we see that 
the resulting upper bounds are below the lower bounds given by ^6). 
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Fig. 6. Channel for which a{R) is continuous aX R = C. 

exponential asynchronism0 

Note next that the a^{R) is better than LB [3] for all rates. 
In fact, empirical evidence suggests that is better than 

LB [3] in general. Additionally, note that and a+{R) 

are not tight. 

Next, we show how another binary symmetric channel has 
some rather different properties. 

Example 2: Consider the binary symmetric channel de- 
picted in Fig. |6] which has the property that when no input is 
provided to the channel the output distribution is symmetric. 
When used synchronously, this channel and that of Example [T] 
are completely equivalent, regardless of the crossover proba- 
bility e. Indeed, since the ★ input symbol in Fig. |6] produces 
and 1 equiprobably, this input can be ignored for coding 
purposes and any code for this channel achieves the same 
performance on the channel in Fig. |4] 

However, this equivalence no longer holds when the chan- 
nels are used asynchronously. To see this, we plot the cor- 
responding upper and lower bounds on performance for this 
channel in Fig. [T] Comparing curve a_ {R) in Fig. |5] with 
curve a+(i?) in Fig. |7] we see that asynchronous capacity 
for the channel of Fig. H] is always larger than that of the 
current example. Moreover, since there is no discontinuity in 
exponent at i? = C in our current example, the difference is 
pronounced at i? = C = 0.368 . . .; for the channel of Fig. |4] 
we have a{C) w 0.12 > 0. 

The discontinuity of a{R) at R = C observed in Example[T] 
is in fact typical, holding in all but one special case. 

Corollary 2 (Discontinuity of a{R) at R = C): We have 
a{C) = if and only if corresponds to the (unique) 
capacity-achieving output distribution of the synchronous 
channel. 

By Corollary |2] for the binary symmetric channel of Exam- 
ple [T] a{R) is discontinuous at i? = C whenever e 1/2. To 
see this, note that the capacity achieving output distribution 
of the synchronous channel assigns equal weights to ★ and 1, 
differently than Q^,. 

The justification for the discontinuity in Example [T] is as 
follows. Since the capacity-achieving output distribution of 
the synchronous channel (Bemoulli(l/2)) is "biased" with 

'To have a better sense of what it means to be able to decode under 
exponential asynchronism and, more specifically, at R = C, consider the 
following numerical example. Consider a codeword length n equal to 150. 
Then a = .12 yields asynchi'onism level A = e"" 6.5 X lO'' . If the 
codeword is, say, 30 centimeters long, then this means that the decoder can 
reliably sequentially decode the sent message, with minimal delay (were the 
decoder cognizant of u, it couldn't achieve a smaller decoding delay since we 
operate at the synchronous capacity), within 130 kilometers of mostly noisy 
data! 



0.12r 




Fig. 7. Capacity upper and lower bounds on the asynchronous capacity of 
the channel of Fig.|6]with e = 0.1. a-{R) represents the lower bound given 
by Theorem |2] LB [3] represents the lower bound obtained in ||3] Theorem 1], 
and a+ (R) represents the upper bound given by Theorem |3] 

respect to the noise distribution Q^, hypothesis "message" and 
"noise" can be discriminated with exponentially small error 
probabilities. This, in turn, enables reliable detection of the 
sent message under exponential asynchronism. By contrast, 
for the channel of Example |2] a{R) is continuous at i? = C, 
regardless of e. 

Proof of Corollary^ From Theorem|2] a strictly positive 
asynchronism exponent can be achieved at R = C if differs 
from the synchronous capacity-achieving output distribution — 
(|6]l is strictly positive for R = C whenever Qi, differs from 
the synchronous capacity-achieving output distribution since 
the divergence between two distributions is zero only if they 
are equal. 

Conversely, suppose is equal to the capacity-achieving 
output distribution of the synchronous channel. We show that 
for any {R, a) coding scheme where R = C, a is necessarily 
equal to zero. 

From Theorem |3] 

a(R) < maxofi 

s 

where S and ai are given by ( fTTl i and (|9|l, respectively. Since 
R = C, I{PiQ) = C, and since = (FiQ)y, we have 
D{{PiQ)]j\\Qir) — 0. Therefore, ai = for any S, and we 
conclude that a{C) =0. ■ 

In addition to the discontinuity at i? = C, a{R) may also 
be discontinuous at rate zero: 

Corollary 3 (Discontinuity of a{R) at R = Q): If 

ao > inaxi:'(g^||g(-|a;)), (13) 

then a{R) is discontinuous at rate R = 0. 

Example 3: Channels that satisfy (fT3T l include those for 
which the following two conditions hold: ★ can't produce all 
channel outputs, and if a channel output can be produced by 

then it can also be produced by any other input symbol. For 
these channels (13[ holds trivially; the right-hand side term is 
finite and the left-hand side term is infinite. The simplest such 
channel is the Z-channel depicted in Fig. [8] with e e (0, 1). 
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* = -7 

if^T 1 

Fig. 8. Channel for which ce{R) is discontinuous at R = 0, assuming 
£ 6 (0,1). 

Note that if e = 0, ( fTsT i doesn't hold since both the left-hand 
side term and the right-hand side term are infinite. In fact, if 
e = then asynchronism doesn't impact communication; rates 
up to the synchronous capacity can be achieved regardless of 
the level of asynchronism, i.e., 

a{R)^ao^oo Re[0,C]. 

To see this, note that by prepending a 1 to each codeword suf- 
fices to guarantee perfect synchronization without impacting 
rate (asymptotically). 

More generally, asynchronous capacity for channels with 
infinite synchronization threshold is established in Theorem |4] 
to come. 

An intuitive justification for the possible discontinuity of 
Q;(i?) at i? = is as follows. Consider a channel where ★ 
cannot produce all channel outputs (such as that depicted in 
Fig.[8]). A natural encoding strategy is to start codewords with a 
common preamble whose possible channel outputs differ from 
the set of symbols that can be generated by The remaining 
parts of the codewords are chosen to form, for instance, 
a good code for the synchronous channel. Whenever the 
decoder observes symbols that cannot be produced by noise 
(a clear sign of the preamble's presence), it stops and decodes 
the upcoming symbols. For this strategy, the probability of 
decoding before the message is actually sent is clearly zero. 
Also, the probability of wrong message isolation conditioned 
on correct preamble location can be made negligible by 
taking codewords long enough. Similarly, the probability of 
missing the preamble can be made negligible by using a 
long enough preamble. Thus, the error probability of this 
training-based scheme can be made negligible, regardless of 
the asynchronism level. 

The problem arises when we add a positive rate constraint, 
which translates into a delay constraint. Conditioned on miss- 
ing the preamble, it can be shown that the delay (r — v)^ 
is large, in fact of order A. It can be shown that if ST3[ 
holds, the probability of missing the preamble is larger than 
1/A. Therefore, a positive rate puts a limit on the maximum 
asynchronism level for which reliable communication can be 
guaranteed, and this limit can be smaller than • 

We note that it is an open question whether or not a{R) may 
be discontinuous at i? = for channels that do not satisfy ( fT3] l. 

Theorem|4]provides an exact characterization of capacity for 
the class of channels with infinite synchronization threshold, 
i.e., whose noise distribution cannot produce all possible 
channel outputs. 

Theorem 4 ( Capacity when Oo = oo ): If ao = oo, then 

a(i?) = a (14) 



a{K) = a 



I \ R 

C 

Fig. 9. Typical shape of the capacity of an asynchronous channel Q for 
which Oo = oo. 

for Re {0,C], where 

a=max min msiyi{D( W\\Q\P), D(W\\QJP)} . 

Therefore, when ao — oo, a{R) is actually a constant 
that does not depend on the rate, as Fig. |9] depicts. Phrased 
differently, R(a) = C up to a = a. For a > a we have 

Ria) = 0. 

Note that when ao = oo, a(R) can be discontinuous at 
R — since the right-hand side of (fT4l i is upper bounded by 

maxZ?(Q^||Q(-|x)), 

which can be finite Fl 

We conclude this section with a result of independent 
interest related to synchronous communication, and which 
is obtained as a byproduct of the analysis used to prove 
Theorem |3] This result essentially says that any nontrivial 
fixed length codebook, i.e., that achieves a nontrivial error 
probability, contains a very good large (constant composition) 
sub-codebook, in the sense that its rate is almost the same as 
the original code, but its error probability decays exponentially 
with a suitable decoder In the following theorem {Cn,<Pn) 
denotes a standard code for a synchronous channel Q, with 
fixed length n codewords and decoding happening at time n. 

Theorem 5: Fix a channel Q e T^l^, let q e (0, 1/2), and 
let e,7 > be such that £ + 76 (0,1) with I £ (0, 1). If 
(C„,0„) is a code that achieves an error probability e, then 
there exists an ?t.o(^, 7, 9, ly |) such that for all n > Uo 
there exists (C^,0^) such thai"! 

1) G'nC C„, C'j is constant composition; 

2) the maximum error probability is less than e„ where 

e„ = 2(n+ 1)1^1-1^1 exp(-,i29/(21n2)); 

In^ ^ In^ _ 
n ~ n 

Theorem |5] is a stronger version of ||7] Corollary 1.9, p. 107] 
and its proof amounts to a tightening of some of the arguments 
in the proof of the latter, but otherwise follows it closely. 

B. Training-Based Schemes 

Practical solutions to asynchronous communication usu- 
ally separate synchronization from information transmission. 
We investigate a very general class of such "training-based 
schemes" in which codewords are composed of two parts: 

'"To see this choose W = Q* in the minimization )14t . 
"We use no(q) to denote some threshold index which could be explicitly 
given as a function of q. 
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a preamble that is common to all codewords, followed by 
information symbols. The decoder first attempts to detect the 
preamble, then decodes the information symbols. The results 
in this section show that such schemes are suboptimal at least 
in certain communication regimes. This leads to the conclu- 
sion that the separation of synchronization and information 
transmission is in general not optimal. 

We start by defining a general class of training-based 
schemes: 

Definition 3 (Training-Based Scheme): A coding scheme 
{(C„, (t„,0„))}„>i is said to be training-based if for some 
1] G [0,1] and all n large enough 

1) there is a common preamble across codewords of size 
rjn; 

2) the decoding time r„ is such that the event 

{Tn = t}, 

conditioned on the rjn observations Y^Zn+i^ 7 is in- 
dependent of all other observations (i.e., F/^" and 

Note that Definition [3] is in fact very general. The only 
restrictions are that the codewords all start with the same 
training sequence, and that the decoder's decision to stop at 
any particular time should be based on the processing of (at 
most) r/n past output symbols corresponding to the length of 
the preamble. 

In the sequel we use a^{R) to denote the asynchronous 
capacity restricted to training based schemes. 

Theorem 6 (Training-based scheme capacity bounds): 
Capacity restricted to training based schemes satisfies 

aZ{R) < a^{R) < al(^R) R G (0, C] (15) 

where 

c^- {R) = min |to2 (^^ ^ ^ j - "+ (^) | - 
where the constants nii and m2 are defined as 

mi = max min iiiei^{D(W\\Q\P), D(W\\QJP)} 
m2 = - \n{mm Q^,{y)) , 

and where a+{R) is defined in Theorem [3] 

Moreover, a rate Re [0, C] training -based scheme allocates 
at most a fraction 




to the preamble. 

Since m2 < oo if and only ao < oo, the upper-bound in (flST l 
implies: 

Corollary 4 (Asynchronism in the high rate regime): For 
training-based schemes 

T/ iD\ R-^C „ 

a (R) — > 

whenever ao < oo. 
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Fig. 10. Upper and lower bounds to capacity restricted to training-based 
schemes (TUB and TLB, respectively) for the binary symmetric channel 
depicted in Fig.|4]with e = 0.1. a^{R) and represent the capacity 

general upper and lower bounds given by Theorems |2] and [5] 

In general, a(C) > as we saw in Corollary |2] Hence a 
direct consequence of Corollaries |2]and|4]is that training -based 
schemes are suboptimal in the high rate regime. Specifically, 
we have the following result. 

Corollary 5 ( Suboptimality of training-based schemes): 
There exists a channel-dependent threshold i?* such that for 
all R > R^, 

a^{R) < a{R) 

except possibly when corresponds to the capacity- 
achieving output distribution of the synchronous channel, or 
when the channel is degenerate, i.e., when ao — oo. 
The last claim of Theorem|6]says that the size of the preamble 
decreases (linearly) as the rate increases. This, in turn, implies 
that a^{R) tends to zero as R approaches C. Hence, in the 
high rate regime most of the symbols should carry information, 
and the decoder should try to detect these symbols as part 
of the decoding process. In other words, synchronization 
and information transmission should be jointly performed; 
transmitted bits should carry information while also helping 
the decoder to locate the sent codeword. 

If we are willing to reduce the rate, are training-based 
schemes still suboptimal? We do not have a definite answer 
to this question, but the following examples provide some 
insights. 

Example 4: Consider the channel depicted in Fig. |4] with 
e = 0.1. In Fig. [TOl we plot the upper and lower bounds 
to capacity restricted to training-based schemes given by 
Theorem |6] TLB represents the lower bound in (ITSI and TUB 
represents the 7712(1 — R/C) term in the upper bound (flST l. 
a-{R) and a+{R) represent the general lower and upper 
bounds to capacity given by Theorems |2] and [3j see Fig. |5] 

By comparing with TUB in Fig.[TO]we observe that 

for rates above roughly 92% of the synchronous capacity C, 
training-based schemes are suboptimal. 

For this channel, we observe that a_ (R) is always above 
TLB. This feature does not generalize to arbitrary crossover 
probabilities e. Indeed, consider the channel in Fig. ID but with 
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an arbitrary crossover probability e, and let r be an arbitrary 
constant such that < r < 1. From Theorem |6] training- 
based schemes can achieve rate asynchronism pairs {R, a) that 
satisfy 

a>mi{l-R/C{e)) i? e (0, C(e)] . 
For the channel at hand 

mi =i?B(l/2||e), 

hence a tends to infinity as e — > 0, for any fixed R e (0, r) — 
note that C(e) — > 1 as e — > 0. 

Now, consider the random coding scheme that yields The- 
orem 12] This scheme, which performs synchronization and 
information transmission jointly, achieves for any given rate 
R G [0, C] asynchronism exponent (see comment after Theo- 
rem |2]i 



max 

{Pey^ : 

I(PQ)>R} 



mill ma.Ji{D{V\\iPQ)y),DiV\\Q,)}. 



This expression is upper-bounded b\l'^ 

max D{Q4{PQ)y), (16) 

Pe'P^:I{PQ)>R 

which is bounded in the limit e — ^ as long as R> oE 
Therefore the joint synchronization-information transmission 
code yielding Theorem |2] can be outperformed by training- 
based schemes at moderate to low rate, even when the output 
distribution when no input is supplied is asymmetric. This 
shows that the general lower bound given by Theorem |2] is 
loose in general. 

Example 5: For the channel depicted in Fig. |6]with e = 0.1, 
in Fig. [TT] we plot the upper and lower bounds on capacity 
restricted to training-based schemes, as given by Theorem |6] 
For this channel it turns out that the training-based scheme 
upper bound m2{l — R/C) (see Theorem|6]l is loose and hence 
TUB = Q:+(^) for all rates. In contrast with the example of 
Fig. [TOl here the general lower bound a_(i?) is below the 
lower bound for the best training best schemes (TLB line). 

V. Analysis 

In this section, we establish the theorems of Section IIVI 

A. Proof of Theorem |2] 

Let a > and P e satisfy the assumption of the 
theorem, i.e., be such that at least one of the following 
inequalities holds 

D{y\\{PQh)>a 

D{V\\Q.)>a (17) 

for all distributions \/ e T^, and let A„ = e"(""'^). 

The proof is based on a random coding argument associated 
with the following communication strategy. The codebook 

'^To see this, choose V = in the minimization. 

'^Let P* = P*{Q) be an input distribution P that maximizes (16) for 
a given channel. Since R < I{P*Q) < H{P*), P* is uniformly bounded 
away from and 1 for all e > 0. This implies that U6t is bounded in the 
hmit e — i- 0. 
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Fig. 11. Lower bound (TLB) to capacity restricted to training-based schemes 
for the channel of Fig. |6] a^{R) and a_(ij) represent the capacity general 
upper and lower bounds given by Theorems |2] and [3] For this channel the 
training upper bound (TUB) coincides with aj^(R), and hence is not plotted 
separately. 



6 = {c^{m)}^^i is randomly generated so that all Ci{m), 
i E {l,2,...,n}, m G {1, 2, . . . , Af}, are i.i.d. according 
to P. The sequential decoder operates according to a two-step 
procedure. The first step consists in making an coarse estimate 
of the location of the sent codeword. Specifically, at time t the 
decoder tries to determine whether the last n output symbols 
are generated by noise or by some codeword on the basis of 



their empirical distribution P — Pyt If D{P\\Q^) < a, 
P is declared a "noise type," the decoder moves to time t + 1, 
and repeats the procedure, i.e., tests whether P t+i is a noise 

type. If, instead, D{P\\Qi,) > a, the decoder marks the current 
time as the beginning of the "decoding window," and proceeds 
to the second step of the decoding procedure. 

The second step consists in exactly locating and identify- 
ing the sent codeword. Once the beginning of the decoding 
window has been marked, the decoder makes a decision the 
first time that the previous n symbols are jointly typical with 
one of the codewords. If no such time is found within n 
successive time steps, the decoder stops and declares a random 
message. The typicality decoder operates as follows0Let P„i 
be the probability measure induced by codeword c"(to) and 
the channel, i.e.. 



P„(a,6) ^ A"(m)(a)0(&|a) (a, &) G X x y. 



(18) 



At time t, the decoder computes the empirical distributions 
P„i induced by c"(m) and the n output symbols yj_„+i for 
all me {1,2,..., M}. If 

\Pc^im)^yl_^_^,{a,b) ~ Pm{a,b)\ < n 

for all {a,b) E X x y and a unique index m, the decoder 
declares message rn as the sent message. Otherwise, it moves 
one step ahead and repeats the second step of the decoding 

'''in the literature this decoder is often referred to as the "strong typicality" 
decoder. 
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procedure on the basis of y\^]^^2' i.e., it tests whether y*^,\+2 
is typical with a codeword. 

At the end of the asynchronism time window, i.e., at time 
An + n—1, if p^"+"^i is either a noisy type or if it is typical 
with none of the codewords, the decoder declares a message 
at random. 

Throughout the argument we assume that the typicality 
parameter /i is a negligible, strictly positive quantity. 

We first show that, on average, a randomly chosen codebook 
combined with the sequential decoding procedure described 
above achieves the rate-exponent pairs [R, a) claimed by the 
theorem. This, as we show at the end of the proof, implies the 
existence of a nonrandom codebook that, together with the 
above decoding procedure, achieves any pair (i?, a) claimed 
by the theorem. 

Let lnA//n = I{PQ) - e, e > 0. We first compute the 
average, over messages and codes, expected reaction delay 
and probability of error These quantities, by symmetry of the 
encoding and decoding procedures, are the same as the average 
over codes expected reaction delay and probability of error 
conditioned on the sending of a particular message. Below, 
expected reaction delay and error probability are computed 
conditioned on the sending of message m — 1. 

Define the following events: 

£i = {£'(Py,^+n-i HQ*) < a , i.e., Py^+^-i is a "noise type' 
£2 = {17+""^ is not typical with C"(l)}, 
£3 = {D{PYt ^JIQ*) > ct for some t < v}. 

For the reaction delay we have 

Ei(r„-i.)+ 

= Ei[(r„-t/)+l(r„ >v + 2n)\ 

-I- El [(t„ - i^)+ 1 (i^ + n < r„ < + 2n)] 

+ Ei[(t„ - t/)+l(T„ <v + n)] 
< {A„ + n- 1)Pi(t„ >iy + 2n) 

+ 2nVi {v + n < Tn < V + 2n) + n, (19) 

where the subscript 1 in Ei and Pi indicates conditioning on 
the event that message rn = 1 is sent. The two probability 
terms on the right-hand side of the second inequality of ([19) 
are bounded as follows. 

The term Pi(t„ > v + 2n) is upper bounded by the 
probability that the decoding window starts after time v+n—1. 
This, in turn, is upper bounded by the probability of the event 
that, at time v + n — 1, the last n output symbols induce a 
noise type. Therefore, we have 



([TtI i (which implies that if D{V\\Qi,) < a then necessarily 
D{V\\{PQ)'^) > a); and where the fourth inequality follows 
from Fact [T] 

The probability Pi(i^ + n < Tn < v + 2n) is at most the 
probability that the decoder has not stopped by time v + n—l. 
This probability, in turn, is at most the probability that, at time 
i^ + n—l, the last n output symbols either induce a noisy type, 
or are not typical with the sent codeword C"(l) (recall that 
message to = 1 is sent). By union bound we get 

Fi{iy + n <T„ < + 2n) < Pi(r„ > 1/ + n) 

<Pi(£i)+Pi(£2) 
< poly(n)e-"" + o(l) 
= 0(1) (n->oo), (21) 



where we used the last three computation steps of (|20> to 
bound Pi(£i), and where we used [7, Lemma 2.12, p. 34] to 
show that Pi (£2) tends to zero as n tends to infinity. From 
(fT9] l, (|20] |. and (ED, we deduce that 

Ei(T„-iy)+ <n(l + o(l)) ( 71 — > 00) 

since An — e"^""'^), by assumption. 

We now compute Pi(£), the average error probability 
conditioned on sending message m = 1. We have 

}, Fi(£) 

= Pi(£n{r„ < i/}) 

+ Fi{E n {v < Tn < V + n - 1}) 
+ Pi(£n{T„ >iy + n}) 

< Pi(t„ < i^) +Pi(£ n{iy <Tn<v + n-l}) 
+ Pi(r„ > i^ + 7i) 

< Pi(£3) + Pi(£ n {i^ < T„ < + n - 1}) 
+ o(l) (n-^oo), (22) 

where for the last inequality we used the definition of £3 and 
upper bounded Pi(t > i^ + n) using the last three computation 
steps of dlTT) . 

For Pi (£3), we have 

Pi(£3) =P(Ut<,{i^(Fy.^*_^J|g.) > a}) 



< Ar, 



E 



-nD(V\\Q,) 



{VeV^: D(y||Q,)>a} 

< An ^~ 

{VeVl: D(V\\Q^)>a} 

< A„e-""poly(n) 
= 0(1) [n 00) 



\{Tn > !^ + 2n) <Pi(£i) 

< 



E 



-n_D(y||(PQ)^ 



^ E 

{VeVl: D{V\\Q^,)<a} 

< poly(n)e-"", 



(20) 



where the second inequality follows from the definition of the 
event £ 1 and Fact |2j where the third inequality follows from 



(23) 



where the first inequality in (|23> follows from the union bound 
over time and Fact IH where the third inequality follows from 
Fact[Tl and where the last equality holds since An — e"'^°'~'^\ 
by assumption. 
We now show that 

Pi(£ n {iy < T„ < i^ + n- 1}) = 0(1) {n^oo), (24) 

which, together with (|22] | and (l23l l. shows that Pi(£) goes to 
zero as n ^ 00. 
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We have 

Pi(£n{i^ <Tn<iy + n-l}) 

= Pi(ur+r'{£n{r„-t}n£3}) 
+ Fi(ur+r'{£n{r„ = t}n£^}) 

< Pi (£3) + Pi(urir '{£ n {t„ = n £^}) 

< 0(1) + Pi({£ n {r„ = + n - 1}) 

+ Pi(ur+r'{£n{r„ = t}n£^3}) 

< 0(1) + 0(1) 

+ Pi(uri"-'{£ n {r„ -On £^3}) (" ^ 00) 

(25) 

where the second inequality follows from (|2Jt ; where the 
fourth inequality follows from the definition of event £2; and 
where the third inequality follows from the fact that, given 
the correct codeword location, i.e., r„ = v + n — 1, the 
typicality decoder guarantees vanishing error probability since 
we assumed that \nM/n — I{PQ) — e (see |7, Chapter 2.1]). 

The event {£ n {r„ = t} n £§}, with ly^t^v + n- 
2, happens when a block of n consecutive symbols, received 
between v — n + 1 and v + n — 2, is jointly typical with a 
codeword other than the sent codeword C"(l). Consider a 
block y" in this range, and let J e J";^'^ be a typical joint 
type, i.e. 

\J{x,y)-P{x)Q{y\x)\<^l 

for all (x, y) e X X V — ^recall that /i > is the typicality 
parameter, which we assume to be a negligible quantity 
throughout the proof. 

For some 1 < fc < n — 1, the first k symbols of block 
are generated by noise, and the remaining n — k symbols are 
generated by the sent codeword, i.e., corresponding to to = 1. 
Thus, y" is independent of any unsent codeword C"(to). The 
probability that C"(m), to 7^ 1, together with yields a 
particular type J is upper bounded as follows: 



J) 







2/") 


^ P(X" = x") 








a;" 






E ^ 


'(F" - 


2/") 


^ g-«(ff(Jx)+C(Jxl|P)) 












< 


E ^ 






ni/(Jx)||^n.p^„^^„ ^ J}| 












< 


E ^ 


'i(y« . 
















< 








(26) 



where H{Jx) denotes the entropy of the left marginal of J, 

HiJx\y) = -E"^y(y) E •^=c|y(a;|2/)ln Jx|y(a;|2/), 



and where /(J) denotes the mutual information induced by 
J. 

The first equality in (|26] | follows from the independence 
of C"{m) and F", the second equality follows from ifTTl 



Theorem 11.1.2, p. 349], and the second inequality follows 
from Q Lemma 2.5, p. 31]. 

It follows that the probability that an unsent codeword 
C"(to) together with F" yields a type J that is typical, i.e., 
close to PQ, is upper bounded as 

for all n large enough, by continuity of the mutual informa- 
tion0 

Note that the set of inequalities (|26] | holds for any block 
of n consecutive output symbols F" that is independent of 
codeword C"(to)01 Hence, from the union bound, it follows 
that 



\(ur+r'{en{r„ = On£^}) 

\Jix.y)^P{x)Q{y\x)\<f,} 

< nMe-"(^(^'5)-^/2) poly(„) 
<e-'"/2poly(n), 



J) 



(27) 



where the second inequality follows from Fact [T] and 
where the third inequality follows from the assumption that 
\nM/n = I{PQ) - e. Combining ^ with ^ yields ( |24] |. 

So far, we have proved that a random codebook has a decod- 
ing delay averaged over messages that is at most n(l + o(l)) 
{n — >■ 00), and an error probability averaged over messages 
that vanishes as n — > 00, whenever A„ = e"("-'^), e > 0. 
This, as we now show, implies the existence of nonrandom 
codebooks achieving the same performance, yielding the de- 
sired result. The expurgation arguments we use are standard 
and in the same spirit as those given in 1 1 1 , p. 203-204] or 
m p. 151]. 

For a particular codebook C„, let P(£|C„) and E((r„ — 
v)^\&n) be the average, over messages, error probability and 
reaction delay, respectively. We have proved that for any e > 0, 



and 



E(E(r„-i.)+|e„)) <n(l + e) 
]E(P(£|e„)) < e 



for all n large enough. 
Define events 



and 



^i = {E(T„-i.)+|e„)<n(l + e)2}, 
yi2 = {P(£|e„) < efc} 



where k is arbitrary. 

From Markov's inequality it follows thal*^ 



^(yiin^s) > 1- 



1 1 



l + e k 



'^The typicality parameter = /^(e) > is cliosen small enough so that 
this inequality holds. 

'^Note that the fact that is partly generated by noise and partly by the 
sent codeword C"{1) is not used to establish (26). 

'^Probability here is averaged over randomly generated codewords. 



14 



Letting k be large enough so that the right-hand side of the 
above inequahty is positive, we deduce that there exists a 
particular code C„ such that 

E(r„-i.)+|e„)<n(l + e)2 

and 

P(£|e„) < ek. 

We now remove from C„ codewords with poor reaction delay 
and error probability. Repeating the argument above with the 
fixed code C„, we see that a positive fraction of the codewords 
of C„ have expected decoding delay at most n(l+e)'^ and error 
probability at most efc^. By only keeping this set of codewords, 
we conclude that for any e > and all n large enough, there 
exists a rate R = I{PQ) — e code operating at asynchronism 
level A = e(°-^)" with maximum error probability less than e. 
■ 

Remark 2: It is possible to somewhat strengthen the con- 
clusion of Theorem|2]in two ways. First, it can be strenthened 
by observing that what we actually proved is that the error 
probability not only vanishes but does so exponentially in n0 
Second, it can be strengthened by showing that the proposed 
random coding scheme achieves (|6]l with equality. A proof is 
deferred to Appendix |A] 

B. Proof of Theorem\3\ 

We show that any rate R > coding scheme operates at an 
asynchronism a bounded from above by maxg min{ai,Q:2}, 
where §, ai, and a2 are defined in the theorem's statement. 

We prove Theorem |3] by establishing the following four 
claims. 

The first claim says that, without loss of generality, we may 
restrict ourselves to constant composition codes. Specifically, it 
is possible to expurgate an arbitrary code to make it of constant 
composition while impacting (asymptotically) neither the rate 
nor the asynchronism exponent the original code is operating 
at. In more detail, the expurgated codebook is such that all 
codewords have the same type, and also so that all codewords 
have the same type over the first A„ symbols (recall that A„ = 
max,„ E(t„ — The parameter 6 in Theorem|3]corresponds 
to the ratio An/n, and Pi and P2 correspond to the empirical 
types over the first A„ symbols and the whole codeword (all 
n symbols), respectively. 

Fix an arbitrarily small constant e > 0. 

Claim 1: Given any coding scheme {(C„, (r„, (/)„))}„>i 
achieving {R, a) with R > and a > 0, there exists a second 
coding scheme {(6^^, (r„, 0„))}„>i achieving (i?, a) that is 
obtained by expurgation, i.e., C Cn, n = 1, 2, . . ., and that 
has constant composition with respect to some distribution 
over the first 

d(n) ^min{[(l + e)A„J,n} (28) 

symbols, and constant composition with respect to some 
distribution P^ over n symbols. (Hence, if [(1 + e)A„J > n, 

'*Note that the eiTor probabihty of the typicaUty decoder given the correct 
message location, i.e., P(£ n {r„ = u + n — 1}}), is exponentially small in 
n (7] Chapter 2]. 



then P^ = Pn-) Distributions P^ and P,j satisfy Claims 2-4 
below. 

Distribution plays the same role as the codeword distribu- 
tion for synchronous communication. As such it should induce 
a large enough input-output channel mutual information to 
support rate R communication. 
Claim 2: For all n large enough 

R<I{P^Q){l + e). 

Distribution P^ is specific to asynchronous communication. 
Intuitively, P,j should induce an output distribution that is suf- 
ficiently different from pure noise so that to allow a decoder to 
distinguish between noise and any particular transmitted mes- 
sage when the asynchronism level corresponds to a. Proper 
message detection means that the decoder should not overreact 
to a sent codeword (i.e., declare a message before even it is 
sent), but also not miss the sent codeword. As an extreme case, 
it is possible to achieve a reaction delay E(r — v)^ equal to 
zero by setting r = 1, at the expense of a large probability of 
error In contrast, one clearly minimizes the error probability 
by waiting until the end of the asynchronism window, i.e., by 
setting T = An + n — 1, at the expense of the rate, which will 
be negligible in this case. 

The ability to properly detect only a single codeword with 
type P^ is captured by condition a < a2 where 012 is defined 
in the theorem's statement. This condition is equivalently 
stated as: 

Claim 3: For any W € y^^l^ and for all n large enough, at 
least one of the following two inequalities holds 

a < D{W\\Q4P?,) + e, 
a < D{W\\Q\P^,) + e. 

As it turns out, if the synchronization threshold is finite, P^ 
plays also a role in the decoder's ability to properly detect the 
transmitted message. This is captured by condition a < ai 
where ai is defined in the theorem's statement. Intuitively, 
ai relates to the probability that the noise produces a string 
of length n that looks typical with the output of a randomly 
selected codeword. If a > ai, the noise produces many such 
strings with high probability, which implies a large probability 
of error 

Claim 4: For all n large enough, 

a<^ {l{PlQ) -R + D{{PlQy^m) + e 
n 

provided that Ofo < 00. 

Note that, by contrast with the condition in Claim |3] the 
condition in Claim |4] depends also on the communication rate 
since the error yielding to the latter condition depends on the 
number of codewords. 

Before proving the above claims, we show how they imply 
Theorem|3] The first part of the Theorem, i.e., when Uo < 00, 
follows from Claims [T]|4] To see this, note that the bounds 
ai and a2 in the Theorem correspond to the bounds of 
Claims [3] and H) respectively, maximized over P^ and P^^. 
The maximization is subjected to the two constraints given by 
Claims [T] and |2] P^ and P^ are the empirical distributions of 
the codewords of G'^ over the first Sn symbols (6 £ [0, 1]), and 
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over the entire codeword length, respectively, and condition 
R < /(PjjQ)(l + e) must be satisfied. Since e > is arbitrary, 
the result then follows by taking the limit e | on the above 
derived bound on a. 

Similarly, the second part of Theorem|3] i.e., when ao = oo, 
is a consequence of Claim [3] only. 

We now prove the claims. As above, e > is supposed to 
be an arbitrarily small constant. 

Proofs of Claims Q] and |2} We show that for all n large 
enough, we have 



R 



1 + e 



d{n) 



(29) 



where CJ^ is a subset of codewords from C„ that have constant 
composition P,j over the first d{n) symbols, where d{n) is 
defined in ( |28] |. and constant composition Pj^ over n symbols. 
This is done via an expurgation argument in the spirit of lfT2l 
p. 151] and HH p. 203-204]. 

We first show the left-hand side inequality of ( |29] l. Since 
{(C„, (r„, 0„))}„>i achieves a rate R, by definition (see 
Definition [U we have 



in|e„| 



>R-e/2 



for all n large enough. Therefore, 



ln|e„| ^R-e/2 



d{n) 



> 



1 + e 



for all n large enough. 

Now, group the codewords of C„ into families such that 
elements of the same family have the same type over the first 
d{n) symbols. Let CJ' be the largest such family and let 
be its type. Within 6", consider the largest subfamily 6^ of 
codewords that have constant composition over n symbols, 
and let be its type (hence, all the codewords in 6^ have 
common type P}^ over d{n) symbols and common type 
over n symbols). 

By assumption, P > 0, so C„ has a number of codewords 
that is exponential in A„. Due to Fact[Tl to establish the left- 
hand side inequality of d29] l, i.e., to show that C'j achieves 
essentially the same rate as C„, it suffices to show that the 
number of subfamilies in is bounded by a polynomial in 
A„. We do this assuming that ao < oo and that Claim |4] (to 
be proved) holds. 

By assumption, ao < oo, and thus from Theorem [1] we 
have that P'((PQ))y ||Q^,) < oo for any input distribution P. 
Using Claim |4] and the assumption that a > 0, we deduce 
that lim inf „_j.oo d{n) /n > 0, which implies that n cannot 
grow faster than linearly in A„. Therefore, Fact [T] implies that 
the number of subfamiUes of CJj is bounded by a polynomial 
in A„. 

We now prove the right-hand side inequality of (|29t . Letting 
E'^ denote the event of a correct decoding, Markov's inequahty 



implies that for every message index m, 

P™({(r„-i^)+ < (l + e)A„}n£^) 
(t„ - v)+ 1 



> 1 - 



> 1 



1 



A„ 1- 

-Prn(£), 



-F„,(£) 



(30) 



since A„ = maxm IEm(T„ — v)^ . The right-hand side of ( 
is strictly greater than zero for n large enough because an 
(P, a) coding scheme achieves a vanishing maximum error 
probability as n oo. This means that is a good code 
for the synchronous channel, i.e., for A ~ 1. More precisely, 
the codebook formed by truncating each codeword in G'^ to 
include only the first d{n) symbols achieves a probability of 
error (asymptotically) bounded away from one with a suitable 
decoding function. This implies that the right-hand side of (|29] l 
holds for n large enough by Q Corollary 1.4, p. 104]. ■ 
In establishing the remaining claims of the proof, unless 
otherwise stated, whenever we refer to a codeword it is as- 
sumed to belong to codebook C^. Moreover, for convenience, 
and with only minor abuse of notation, we let M denote the 
number of codewords in 

Proof of Claim\3}[ We fix € ^^1-^ and show that for 
all n large enough, at least one of the two inequalities 

D{W\\Q\P^)>a-e, 

D{W\\Q^Pl)>a~e, 

must hold. To establish this, it may be helpful to interpret W as 
the true channel behavior during the information transmission 
period, i.e., as the conditional distribution induced by the 
transmitted codeword and the corresponding channel output. 
With this interpretation, D{W\\Q\P^) represents the large 
deviation exponent of the probability that the underlying 
channel Q behaves as W when codeword distribution is P^, 
and D{W\\Qi,\P[^^) represents the large deviation exponent of 
the probability that the noise behaves as W when codeword 
distribution is P,j. As it turns out, if both the above inequalities 
are reversed for a certain W , the asynchronism exponent is too 
large. In fact, in this case both the transmitted message and 
pure noise are very likely to produce such a W . This, in turn 
will confuse the decoder It will either miss the transmitted 
codeword or stop before even the actual codeword is sent. 

In the sequel, we often use the shorthand notation Tvi/(to) 
forT;V(c"(m)). 

Observe first that if n is such that 



(y^.+„-i e 7w{m)) = 0, 



(31) 



then 



D{W\\Q\Pl) = oo, 
by Fact [5] Similarly, observe that if n is such that 

P,(y;+"-i e 7w{m)) = 0, (32) 

where denotes the probability under pure noise (i.e., the 
Ki's are i.i.d. according to Q^), then 

D{w\\QAPl) = ^- 
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Since the above two observations hold regardless of m (be- 
cause all codewords in C'^ have the same type), Claim [3] holds 
trivially for any value of n for which (|3TT i or (|32] i is satisfied. 

In the sequel, we thus restrict our attention to values of n 
for which 



and 



l^-\-7l — l 



e Iwim)) ^ 



e Twim)) ^ 0. 



(33) 



(34) 



Our approach is to use a change of measure to show 
that if Claim [3] does not hold, then the expected reaction 
delay grows exponentially with n, implying that the rate is 
asymptotically equal to zero. To see this, note that any coding 
scheme that achieves vanishing error probability cannot have 
InM grow faster than linearly with n, simply because of 
the limitations imposed by the capacity of the synchronous 
channel. Therefore, if E(r„ — v)'^ grows exponentially with 
n, the rate goes to zero exponentially with n. And note that for 
^{Tn — t^)^ to grow exponentially, it suffices that E,„ (t„ — 
grows exponentially for at least one message index to, since 
A„ = m.axm^m{Tn — i^)^ by definition. 

To simplify the exposition and avoid heavy notation, in the 
following arguments we disregard discrepancies due to the 
rounding of noninteger quantities. We may, for instance, treat 
A/n as an integer even if A is not a multiple of n. This has 
no consequences on the final results, as these discrepancies 
vanish when we consider code with blocklength n tending to 
infinity. 

We start by lower bounding the reaction delay as0 

1 ^ 

A„ = max — E„i,t(T„ - i)+ 

m j4 — ^ 

t=l 

t=l 

> t + An/3) 



t=l 

A„/3 



> 2A„/3), 



(35) 



where for the first inequality we used Markov's inequality. 
The message index m on the right-hand side of ( [35] l will be 
specified later; for now it may correspond to any message. 

We lower bound each term Pm,t(T„ > 2A„/3) in the above 
sum as 

Pm,t(T„ > 2A„/3) 

> Pm,t(r„ > 2A„/3 I e 7w{m)) 

> P„mK > 2An/3 I e 7w{m)) 

X e-"-°i poly(n), (36) 

"Recall that the subscripts m, t indicate conditioning on the event that 
message m stalls being sent at time t. 



where Di ^ D{W\\Q\P^), and where the second inequality 
follows from Fact fW'l 

The key step is to apply the change of measure 



(37) 



P™,t(T„ > 2^„/3|y/+"-i e Tv^(to)) 

= P4t„ > 2A„/3|y/+"-i € Tiv(to)) . 

To see that ( |37] ) holds, first note that for any ?/" 

PrnArn > 2A„/3|y/+"-i - y") 

- P.(t„ > 2A„/3|r/+"-i = y") 

since distribution P.,„,f and P^, differ only over channel outputs 

Next, since sequences inside Th/(to) are permutations of 
each other 



n\\^t-\-n—l 



e Ti^(to)) 



1 



|Th.(to)| 



p.(r/+"-i = e t^(to)), 



we get 

P„m(t„ > 2A„/3|y/+"-i e Tvy(rn)) 

X P™,,(y/+"-i = e t^(to)) 

= E P.(r„>2A„/3|r/+"-i=y") 



i/"eTw(m) 

X P4r/+"-i = y"|r/+""i G Tw(m)) 
= P4t„ > 2A„/3|y/+"-i e Th.(to)). 



This proves ( l37T i. Substituting ( l37T i into the right-hand side of 
and using ( [35] l. we get 



A„ > e'"^i poly(n) 

A/3 

X ^P.(t„ > 2A„/3|y/+"-i e 7w{m)) 
t=i 

A/3 

X ^P4t„ > 2A„/3,y/+"-^ € Iwim)), 
t=i 

where D2 = D{W\\Qi,\P^), and where the last inequality 
follows from Fact |3] By summing only over the indices that 
are multiples of n, we obtain the weaker inequality 

A„ > e-"(-°i-°^) poly(n) 

A/3n 

X J2 ^*(^« ^ 2A„/3,l^C+"-i e Tm/(to)). (38) 

Using (|38] |. we show that E(t„ — grows exponentially with 
n whenever Di and 1)2 are both upper bounded by a—e. This, 
as we saw above, implies that the rate is asymptotically equal 
to zero, yielding Claim [3] 

^"Note that the right-hand side of the first inequality in (|36) is well-defined 
because of j33t . 
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Let A — e"", and let /i ^ e/2 . We rewrite the above sum- 
mation over A/3n indices as a sum of Ai = e"'^"~^^~^^ /3n 
superblocks of A2 = 6"'^^^+'^^ indices. We have 

A/3n 

s=l JS/, 

where Ig denotes the sth superblock of A2 indices. Applying 
the union bound (in reverse), we see that 

Ai 

E E ^ 2A„/3,i;C^"-i e 7wim)) 

A, , 

We now show that each term 

P.(t„ > 2v4„/3, U,e/,{y,^:^+"-^ e Tw(m)}) (39) 

in the above summation is large, say greater than 1/2, by 
showing that each of them involves the intersection of two 
large probability events. This, together with (|3Ft , implies that 

A„ =poly(n)0(e"("-^^''^)) 

> 0(exp(ne/2)) (40) 

since D\ < a — e, yielding the desired resultF*! 

Letting £ denote the decoding error event, we have for all 
n large enough 

e > P™(£) 

> P„,(£|i/ > 2A>/3,T„ < 2A„/3) 

X V„,{v > 2A„/3,T„ < 2A„/3) 

> ip™(z. > 2A„/3)P™(r„ < 2A^mu > 2A„/3) 

> ^F„(t„ < 2A„/3|i. > 2A„/3). (41) 

The third inequality follows by noting that the event {v > 
2An/3,Tn < 2yl„/3} corresponds to the situation where 
the decoder stops after observing only pure noise. Since a 
codebook consists of at least two codewords|3 such an event 
causes an error with probability at least 1/2 for at least one 
message m. Thus, inequality dTIT i holds under the assumption 
that m corresponds to such a message 

^'Our proof shows that for all indices n for which D\ < a — e and 
D2 < a — t, )40t holds. Therefore, if Di < a — e and D2 < a — e for every 
n large enough, the reaction delay grows exponentially with n, and thus the 
rate vanishes. In the case where Di < a — e and D2 < a — € does not 
hold for all n large enough, but still holds for infinitely many values of n, 
the con'esponding asymptotic rate is still zero by Definition [T] 
By assumption, see Section Iml 

^^Regarding the fourth inequality in ^41), note that Pm(!^ > 2A„/3) 
should be lower bounded by 1/4 instead of 1/3 had we taken into account 
discrepancies due to rounding of noninteger quantities. As mentioned earlier, 
we disregard these discrepancies as they play no role asymptotically. 



Since the event {t„ < 2A„/3} depends on the channel 
outputs only up to time 2A„/3, we have 

F™(t„ < 2An/3\i^ > 2A„/3) = P*(t„ < 2A„/3). (42) 

Combining (l42l i with i4T[ we get 

P4t„ > 2A„/3) > 1 - 6e. (43) 

Now, because the Yj"^"'^^, j G Is, are i.i.d. under P^,, 

From Fact |3] it follows that 

P4^" e 7w{m)) > poly(n)exp(-ni?2), 
and by definition = e"(^^+''\ so 

P. ( U,e/. {Yj:^''-' e 7w{m)}\ = 1 - o(l) {n ^ 00). 

(44) 

Combining ( |43] | and (l44l l. we see that each term ( |39] | involves 
the intersection of large probability events for at least one 
message index rn. For such a message index, by choosing e 
sufficiently small, we see that for all sufficiently large n, every 
single term (|39]l, s G {1, 2, . . . , Ai} is bigger than 1/2. ■ 

Finally, to establish the remaining Claim HI we make use 
of Theorem |5] whose proof is provided in Appendix |B] 
This theorem implies that any nontrivial codebook contains a 
(large) set of codewords whose rate is almost the same as the 
original codebook and whose error probability decays faster 
than polynomially, say as with a suitable decoder. Note 

that we don't use the full implication of Theorem |5] 

Proof of Claim ^ The main idea behind the proof is 
that if Claim |4] does not hold, the noise is likely to produce an 
output that is "typical" with a codeword before the message 
is even sent, which means that any decoder must have large 
error probability. Although the idea is fairly simple, it turns 
out that a suitable definition for "typical" set and its related 
error probability analysis make the proof somewhat lengthy. 

Proceeding formally, consider inequality d30t . This inequal- 
ity says that, with nonzero probability, the decoder makes a 
correct decision and stops soon after the beginning of the 
information transmission period. This motivates the definition 
of a new random process, which we call the modified output 
process. With a slight abuse of notation, in the remainder of 
the proof we use Yi,Y2, . . . , Ya+u-i to denote the modified 
output process. The modified output process is generated as 
if the sent codeword were truncated at the position v + d{n), 
where d{n) is defined in ( |28] |. Hence, this process can be 
thought of as the random process "viewed" by the sequential 
decoder 

Specifically, the distribution of the modified output process 
is as follows. If 

n > [A„(l + e)J, 

then the l^'s for 
i&{l,...,i^-l}U{i^+ [A„(l + e)J , . . . , A„ + n - 1} 
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are i.i.d. according to Q*, whereas the block 

Yi^,Yu+l, ■ ■ ■ , ^i/+LA„(l+e)J-l 

is distributed according to (3(-|c'"-"-'), the output distribution 
given that a randomly selected codeword has been transmitted. 
Note that, in the conditioning, we use c''(") instead of c'*^") (m) 
to emphasize that the output distribution is averaged over all 
possible messages, i.e., by definition 



Instead, if 



1 



M 



m—1 



n< [A„(l + e)J, 



(m)). 



then the modified output process has the same distribution as 
the original one, i.e., the Yi's for 

i e {1, . . . ,iy - 1} U {ly + n, . . . , An + n - 1} 

are i.i.d. according to Q*, whereas the block 

is distributed according to Q(-|c"). 

Consider the following augmented decoder that, in addition 
to declaring a message, also outputs the time interval 

[r„- [A„(l + e)J +l,r„- [A„(l + e)J +2,...,t„], 

of size [A„(l + e)J. A simple consequence of the right-hand 
side of ( l30t being (asymptotically) bounded away from zero 
is that, for n large enough, if the augmented decoder is given 
a modified output process instead of the original one, with 
a strictly positive probability it declares the correct message, 
and the time interval it outputs contains v. 

Now, suppose the decoder is given the modified output 
process and that it is revealed that the (possibly truncated) 
sent codeword was sent in one of the 



[An + n— 1) — [u mod d{n)) 



(45) 



consecutive blocks of duration d{n), as shown in Fig. [121 
Using this additional knowledge, the decoder can now both 
declare the sent message and output a list of 



rLA„(l + 6)JMn)l 



(46) 



block positions, one of which corresponding to the sent 
message, with a probability strictly away from zero for all 
n large enough. To do this the decoder, at time t„, declares 
the decoded message and declares the in blocks that overlap 
with the time indices in 

{t„- [A„(l + e)J +l,r„- [A„(l + e)J +2,...,t„}. 

We now show that the above task that consists of declaring 
the sent message and producing a list of 4i blocks of size d{n), 
one of which being the output of the transmitted message, 
can be performed only if a satisfies Claim |4] To that aim we 
consider the performance of the (optimal) maximum likelihood 
decoder that observes output sequences of maximal length 

d{n) ■ r„. 



Fig. 12. Parsing of the entire received sequence of size A + n — 1 into 
blocks of length d{n). one of which being generated by the sent message, 
and the others being generated by noise. 



Given a sample j/i, 7/2, • • • , 2/A+n-i of the modified output 
process, and its parsing into consecutive blocks of duration 
d{n), the optimal decoder outputs a list of £„ blocks that are 
most likely to occur. More precisely, the maximum likelihood 
£„-list decoder operates as follows. For each message m, it 
finds a list of ^„ blocks j/'^^") (among all r„ blocks) that 
maximize the ratio 

Q(/(")|c'^(")(m)) 

and computes the sum of these ratios. The maximum like- 
lihood £„-list decoder then outputs the list whose sum is 
maximal, and declares the corresponding message 

The rest of the proof consists in deriving an upper bound 
on the probability of correct maximum likelihood ^„-list 
decoding, and show that this bound tends to zero if Claim |4] is 
not satisfied. To that aim, we first quantify the probability that 
the noise distribution outputs a sequence that is typical with 
a codeword, since the performance of the maximum likelihood 
£„-list decoder depends on this probability, as we show below. 

By assumption, (CJ^, (t„, achieves a probability of 
error ejj — ^ as n — >^ 00 at the asynchronism exponent a. This 
implies that can also achieve a nontrivial error probability 
on the synchronous channel (i.e., with A = 1). Specifically, by 
using the same argument as for (l30t . we deduce that we can 
use on the synchronous channel, force decoding to happen 
at the fixed time 

d{7i) =min{n, [(l + e)A„J}, 

where A„ corresponds to the reaction delay obtained by 
[Q'nT [Tm 4>n)) in the asynchronous setting, and guarantee a 
(maximum) probability of error e" such that 

with a suitable decoder Since the right-hand side of the above 
inequality is strictly below one for n large enough. Theorem |5] 
with g = 1/4 implies that the code CJj has a large subcode 
C„, i.e., of almost the same rate with respect to d{n), that, 
together with an appropriate decoding function achieves 
a maximum error probability at most equal to 



e„ = 2(71 + exp(- V^/(21n2)) 



(48) 



for all n large enough. 



see this, consider a channel output y^"'^ that is composed of v^i 
consecutive blocks of size d„, where the jth block is generated by codeword 
(ji(n) j^jij where all the other blocks are generated by noise. The probabiUty 
of this channel output is 

P(j/''-'-"|m, j) = Q(3;'^(")(j)|c''("') n Q*(s/'''"^(i)) 
where j/'^t") (j), j G {1, 2, . . . , r„}, denotes the jth bloc of y'^^'^n 



19 



We now start a digression on the code (C„, when used 
on channel Q synchronously. The point is to exhibit a set of 
"typical output sequences" that cause the decoder (pn to make 
an error with "large probability." We then move back to the 
asynchronous channel Q and show that when Claim |4] does 
not hold, the noise distribution is likely to produce typical 
output sequences, thereby inducing the maximum likelihood 
£„-list decoder into error 

Unless stated otherwise, we now consider {Qm4>n) when 
used on the synchronous channel. In particular error events 
are defined with respect to this setting. 

The set of typical output sequences is obtained through a 
few steps. We first define the set A,n with respect to codeword 
c'^(")(m) e e„ as 

A„, ^{y'^'"' e 7w{c'^^'^\m)) with W € T^l^ : 

P(Tw(c'^("n"^))|c'("'("^)) > (49) 

where e„ is defined in ( |48] |. 

Note that, by using Fact [3] it can easily be checked that 
Am is nonempty for n large enough (depending on |X| and 
I y I ), which we assume throughout the argument. For a fixed m, 
consider the set of sequences in Am that maximize (|47| |. These 
sequences form a set TQ(c''^"''(m)), for some Q e Tn'"*". It 
follows that for every message index m for which c''*^"^ (m) S 
C„, we have 

ed(n) > IPm(£) 

> P™(£|{y;+'^(")-i e TQ(c'^(")(m))}) 

xP„({y;+'^(")-ieTQ(c'^(")(™))}) 

> P„(£|{y;+'^(")-i e TQ(c<")(m))})Vi^ 

>p^(£|{y;+'^(")-iea3„j)x 
1 

> -X 

- 2 

X y/£d(n) (50) 

where for the third inequality we used the definition of Q; 
where on the right-hand side of the fourth inequality we 
defined the set 

m A 

■Dm = 

{y''(") e TQ(c'*(")(m)) n {Um'^^TQiS-Hm')))}; 

and where the fifth inequality follows from this definition^ 
From (|50] | we get 

<2ye^. (51) 

^^Note that, given that message m is sent, if the channel produces a 
sequence in Hm at its output, the (standard) optimal maximum likelihood 
decoder makes an error with probability at least half. Hence the decoding 
rule <j>n also makes an eiTor with probability at least half. 



Therefore, by defining Tim as 

®™^TQ(c''(")(m))\2?,„ 

the complement of ®„i in Tg(c''("^(TO)), it follows from (ISTT l 
that 

since under Fm all the sequences in Tq(c'^("^)(to) are 
equiprobable. 

The set U*f,_^„j2?m' is the sought set of "typical out- 
put sequences" that causes the decoder make an error with 
"high probability" conditioned on the sending of message 
m and conditioned on the channel outputting a sequence in 
Tq(c''("' (m)). This ends our digression on (C„, ^„). 

We now compute a lower bound on the probability under 
of producing a sequence in V}^^^m- Because the sets 
{25,„} are disjoint, we deduce that 



M 



S™|>(l-2V^)5]|T<3(c<")(m))| 

m=2 



> 



(d(n)-M)|3=l-|y| 
1 



(4n)|x||y| 



(i(n)(//(Q|Pi")+lnM/d(n)) 



(52) 



for all n large enough. For the second inequality we used 
||7] Lemma 2.5, p. 31]. For the third inequality we used the 
fact that d(n)< n, M > 2, (1 ~ ^^/ej^:;^) > 1/2 for n 
large enough]^ ™d that, without loss of generality, we may 
assume that |X| • j^l > 2 since the synchronous capacity C is 
non-zero — as we assume throughout the paper Hence we get 



Q.{Li^l,Tm) = J2 
> I Uft2 



Q.{y 



d{n}) 



> 



1 



(4n)l^l|y| 

^ g-d(n)(n((P^Q) 



d{n){H{Q\P^) + {hiM)/d(n)) 



for all n large enough, where for the second inequality we 
used ^ and 111, Theorem 11.1.2, p. 349]. Letting 

en ^ \nI{PlQ) - {\nM)/d{n)+D{{PlQMQ^), 

we thus have 

Q^l^^^^m) > 



1 



-en-d[n) 



(4„)|x||y| 



(53) 



for n large enough. 

Using (|53] ), we now prove Claim |4]by contradiction. Specif- 
ically, assuming that 



a > 



d{n) 



e/2 for infinitely many indices n, (54) 



we prove that, given message m = 1 is sent, the probability 
of error of the maximum likelihood £„-list decoder does not 

-*Note that d{n) "Ji^ since the coding scheme under consideration 
achieves a strictly positive rate. 
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converge to zero. As final step, we prove that the opposite of 
( |54] i implies Claim |4] 
Define the events 



— rvv+n—l 



£-2 = {z<l- ^ 



^Q.n — en.-d{n) 



2 (4j^)2|X||y| 

where ^li is defined in ( |49] l. and where Z denotes the random 
variable that counts the number of blocks generated by 
that are in □^^2'^™- Define also the complement set 

£3 = (£iU£2)^ 

The probability that the maximum likelihood ^„-list decoder 
makes a correct decision given that message m = 1 is sent is 
upper bounded as 

3 

i=l 

<Pi(£i) + Pi(£2) + Pi(£1£3). (55) 

From the definition of Ai, we have 

Pi(£i)=o(l) (n^oo). (56) 

Now for Pi (£2)- There are r„ — 1 blocks independently 
generated by (r„ is defined in (l45ll). Each of these blocks 
has a probability at least equal to the right-hand side of 1 
to fall within U*f^2^™- Hence, using ( |53] ) we get 



EiZ > (r„ - 1) 



1 



(4n)|X||y| 



-end{ri) 



> 



1 



^an— e„(i(n) 



(4„)2|X||y|' 

since r„ > e""/ri. Therefore, 



(57) 



^(£2) <Pi(Z< (EiZ)/2) 
4 



< 



EiZ 

< poly(n)e-""+'="''(") 



(58) 



where the first inequality follows from (l5Tt and the definition 
of £2; where for the second inequality we used Chebyshev's 
inequality and the fact that the variance of a binomial is upper 
bounded by its mean; and where for the third inequality we 
used dSTl l. 

Finally for Pi(£'^|£3). Given £3, the decoder sees at least 



1 



1 



^an—en-d{n) 



2 (4n)2|3^l|y| 

time slots whose corresponding ratios (l47t are at least as large 
as the one induced by the correct block y^'^+'*(") ^ Hence, 
given £3, the decoder produces a list of ^„ block positions, one 
of which corresponds to the sent message, with probability at 
most 

-1 



Pi(£1£3) 



1 



1 



^an— Cn, ■rf(n) 



2 (4n)2|3:||y| 
poly(n)e-""+'=""*("\ 



(59) 



where the first inequality foUows from union bound, and where 
for the equality we used the fact that finite rate implies = 
poly(n)El 

From (|55ll, (ESll, dHJ, and (ESll, the probability that the 
maximum likelihood £„-list decoder makes a correct decision. 
Pi (£'^), is arbitrarily small for infinitely many indices n 
whenever ( |54] | holds. Therefore to achieve vanishing error 
probability we must have, for all n large enough. 



a < 
d(n) 



'l{PlQ)-{\nM)/d{n) 
+ e/2. 



D{{P'^QMQ,)) 



(60) 



(61) 



(62) 



We now show, via a continuity argument, that the above 
condition implies Claim H Recall that Q e T^l^, defined 
just after ( |49] l, depends on n and has the property 

P(TQ(c'^(")(m)|c'^(")(m)))>Ve^. 
Now, from Fact [3] we also have the upper bound 

P(TQ(c'^(")(m)|c'^(")(m))) < e-'^^")^^^"'?!^"). 

Since ^(d(n) = J7(e^\/ ''<"'), from dHTT l and ( |62] | we get 

£i(Q||Q|Pi") ^ as n^oo, 

and therefore 

\\PliQ-PnQ\\^^ as n^oo, 

where || • || denotes the Li norm. Hence, by continuity of the 
divergence, condition (I6OI 1 gives, for all n large enough, 

a < (63) 
d{n) 



{KP^Q) - {\nM)ld{n)+D{{PlQ)^\m) 



(64) 



which yields Claim |4] 



C. Proof of Corollary \3\ 

By assumption Uo is nonzero since divergence is always 
non-negative. This implies that the synchronous capacity is 
nonzero by the last claim of Theorem [T] This, in turn, implies 
that (i?, a) is achievable for some sufficiently small i? > 
and a > by |3, Corollary 1]. 

Using Theorem [3] 



a < a{R) < maxQ;2 



(65) 



where 02 is given by expression (fTOl i. In this expression, by 
letting W = Qi, in the minimization, we deduce that a2 < 
D{Q^\\Q\P2), and therefore 



maxQ!2 < max£'((3*||Q|P2) 

^ma.xDiQ^\\Q\P2) 
P2 



maxy^ Qi,{y) In 



Q(.y\^) 



= ma.x D{Q4Q{-\x)), 

X 

-'This follows from the definition of rate R = lnA//E(T — + , the fact 
that In A//n < C for reliable communication, and the definition of 146) . 
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and from ( |65T l we get 



a < maxD{Q4Q{-\x)). 

Since, by assumption, 

tto > maxL'((3^||(9(-|x)), 

and since ao = a{R = 0) by Theorem[T] it follows that a{R) 
is discontinuous at i? = 0. ■ 



D. Proof of Theorem |4] 

We first exhibit a coding scheme that achieves any (_R, a) 
with i? < C and 

a < max min WLa^^{D{W\\Q\P),D{W\\Q^P)}. 

All codewords start with a common preamble that is 
composed of (ln(n))^ repetitions of a symbol x such that 
D{Q{-\x)\\Qi,) = oo (such a symbol exists since ao = oo). 
The next (ln(n))^ symbols of each codeword are drawn from 
a code that achieves a rate equal to i? — e on the synchronous 
channel. Finally, all the codewords end with a common large 
suffix s' of size I = n — (ln(n))^ — (ln(n))^ that has an 
empirical type P such that, for all W ^ T^l-*", at least one of 
the following two inequalities holds: 

D{W\\Q\P) > a 
DiW\\Q,\P) > a. 

The receiver runs two sequential decoders in parallel, and 
makes a decision whenever one of the two decoder declares 
a message. If the two decoders declare different messages at 
the same time, the receiver declares one of the messages at 
random. 

The first decoder tries to identify the sent message by first 
locating the preamble. At time t it checks if the channel output 
Ut can be generated by x but cannot be generated by noise, 
i.e., if 

Q{yt\x)>0 and Q{yt\i^)^0. (66) 



If condition ( 1661 ) does not hold, the decoder moves one-step 
ahead and checks condition ( |66] | at time t + 1. If condition ( |66] | 
does hold, the decoder marks the current time as the beginning 
of the "decoding window" and proceeds to the second step. 
The second step consists in exactly locating and identifying the 
sent codeword. Once the beginning of the decoding window 
has been marked, the decoder makes a decision the first time 
it observes (inn)^ symbols that are typical with one of the 
codewords. If no such time is found within (ln(n))^ + (ln(n))'^ 
time steps from the time the decoding window has been 
marked, the decoder declares a random message. 

The purpose of the second decoder is to control the average 
reaction delay by stopping the decoding process in the rare 
event when the first decoder misses the codeword. Specifically, 
the second "decoder" is only a stopping rule based on the 
suffix s'. At each time t the second decoder checks whether 
D{PYt ^ ^\\Q\P) < ct- If so, the decoder stops and declares 
a random message. If not, the decoder moves one step ahead. 



The arguments for proving that the coding scheme described 
above achieves {R, a) provided 

a < maxminmax{i:>(W^||g|P),L>(VF||Q*|P)} (67) 
p w 

closely parallel those used to prove Theorem |2] and are 
therefore omittedF^ 

The converse is the second part of Theorem |3] ■ 

E. Proof of Theorem \6\ 

1) Lower bound: To establish the lower bound in Theo- 
rem |6] we exhibit a training based scheme with preamble size 
rjn with 

V=(l- R/C), (68) 
and that achieves any rate asynchronism pair {R, a) such that 

IV 
. C . 

where 



a < mi 1 



R e (0, C] 



(69) 



mi = max min max{D(W\\Q\P), D(W\\QJP)}. 

Fix R G (0, C] and let a satisfy (|69T l. Each codeword starts 
with a common preamble of size rjn where n is given by ( |68] l 
and whose empirical distribution is equal t(0 

P - 

argmaxf min max{D WIIQIP), DfW^IIQJP)}). 

The remaining (1 — r\)n symbols of each codeword are i.i.d. 
generated according to a distribution P that almost achieves 
capacity of the synchronous channel, i.e., such that I(PQ) = 
C — e for some small e > 0. 

Note that by and (|68l), a is such that for any G T^l^ 
at least one of the following two inequalities holds: 



DiW\\Q\Pp) > a/r, 
D{W\mPp)>a/7j. 



(70) 



The preamble detection rule is to stop the first time when 
last rjn output symbols i^/_^„4.i induce an empirical condi- 
tional probability iV/ ^i\xi" such that 

where x''" is the preamble. 

When the preamble is located, the decoder makes a decision 
on the basis of the upcoming (1 — rjjn output symbols 
using maximum likelihood decoding. If no preamble has been 
located by time An + n — 1, the decoder declares a message 
at random. 

We compute the reaction delay and the error probability. 
For notational convenience, instead of the decoding time, we 
consider the time t„ that the decoder detects the preamble, 
i.e., the first time t such that ( TtTI ) holds. The actual decoding 

^**In particular, note that the first decoder never stops before time u. 

^'^ Pp need not be a valid type for finite values of n, but this small 
discrepancy plays no role asymptotically since Pp can be approximated 
arbitrarily well with types of order sufficiently large. 
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time occurs (1 — 77)71 time instants after the preamble has been 
detected, i.e., at time t„ + (1 — ri)n. 
For the reaction delay we have 

E(t„ =Ei(t„ 

= Ei[(7:„-i^)+]l(T„ > ^/ + 777^)] 

+ El [(t„ - 7^)+ 1 (t„ < + 7?71 - 1)] 

<iAn + n- l)Pi (r„ >iy + T]n) + rjn (72) 

where, as usual, the subscript 1 in Ei and Pi indicates 
conditioning on the event that message 7?t, = 1 is sent. A 
similar computation as in (|20l i yields 

Pi(r„ > 7/ + 7771) 

< Pi(z?(p^_.+.„-i|^,„||g|Pp) > a/77) 

< ^ ^-nnD{W\\Q\P^) 
WeVV^: D{W\\Q\Pp)>a/n 

< poly(7i)e-"" . (73) 

The first inequality follows from the fact that event {t„ > 
ly + n} is included into event 

{i?(P^.+,„-i|^,„||Q|Pp) > I?(P^.+..-i|,,„||Q.|Pp))} 

which, in turn, is included into event 

{D{Py.+,„-^^^,J\Q\Pp)>a/r,} 

because of (iTOl i. The second inequality follows from Fact |2] 
Hence, from (iTZt and ( l73T l 



whenever A„ 



i(T„-7/)+ <7;77^ + 0(l) 



(74) 



, e > 0. Since the actual decoding 
time occurs (1 — 77)77. time instants after Tn, where 77 = (1 — 
R/C), and that the code used to transmit information achieves 
the capacity of the synchronous channel, the above strategy 
operates at rate R. 

To show that the above strategy achieves vanishing error 
probability, one uses arguments similar to those used to prove 
Theorem |2] (see from paragraph after (2D onwards), so the 
proof is omitted. There is one little caveat in the analysis that 
concerns the event when the preamble is located somewhat 
earlier than its actual timing, i.e., when the decoder locates the 
preamble over a time period [t — rjn + 1, . . . ,t\ with v <t < 
v+rin~2. One way to make the probability of this event vanish 
as 77, CO, is to have the preamble have a "sufficiently large" 
Hamming distance with any of its shifts. To guarantee this, 
one just needs to modify the original preamble in a few (say, 
logTi) positions. This modifies the preamble type negligibly. 
For a detailed discussion on how to make this modification, 
we refer the reader to [9 1, where the problem is discussed in 
the context of sequential frame synchronization. 

Each instance of the above random coding strategy satisfies 
the conditions of Definition [3l there is a common preamble of 
size rjn and the decoder decides to stop at any particular time t 
based on Y^Z^^i" ■ We now show that there exists a particular 
instance yielding the desired rate and error probability. 

First note that the above rate analysis only depends on the 
preamble, and not on the codebook that follows the preamble. 



Hence, because the error probabiUty, averaged over codebooks 
and messages, vanishes, we deduce that there exists at least 
one codebook that achieves rate R and whose average over 
messages error probability tends to zero. 

From this code, we remove codewords with poor error 
probability, say whose error probabilities are at least twice 
the average error probability. The resulting expurgated code 
has a rate that tends to R and a vanishing maximum error 
probability. 

2} Upper bound: To establish the upper bound it suffices 
to show that for training based schemes (P, a) with P > 
must satisfy 



a < 777,2 



1-^ 

C 



(75) 



The upper bound in Theorem |6] then follows from (l75t and 
the general upper bound derived in Theorem |3] 

The upper bound (|75] | follows from the following lemma: 
Lemma 1: A rate P > coding scheme whose decoder 
operates according to a sliding window stopping rule with 
window size 7771 cannot achieve an asynchronism exponent 
larger than 77777,2- 

Lemma[T]says that any coding scheme with a limited memory 
stopping rule capable of processing only -qn symbols at a time 
achieves an asynchronism exponent at most 0{rj), unless P = 
or if the channel is degenerate, i.e., ao = 7712 = 00, in which 
case Lemma[T]is trivial and we have the asynchronous capacity 
expression given by Theorem |4] 

To deduce ( |75] ) from Lemma [T] consider a training-based 
scheme which achieves a delay A with a non-trivial error 
probability (i.e., bounded away from 0). Because the preamble 
conveys no information, the rate is at most 

,min{A. n\ 



C- 



•qn 



A 



<C{l-ii) 



by the channel coding theorem for a synchronous channel. 
Hence, for a rate P > training-based scheme the training 
fraction 7/ is upper bounded as 

P 

This implies dTsT l by Lemma [T] ■ 
Proof of Lemma Q} The lemma holds trivially if 777,2 = 
CO. We thus assume that 7712 < 00. Consider a training -based 
scheme {(C„, (t„, (/>„))}„>i in the sense of Definition [3] For 
notational convenience, we consider t„ to be the time when 
the decoder detects the preamble. The actual decoding time (in 
the sense of Definition [3] part |2]l occurs (1 — 77)71 times instants 
after the preamble has been detected, i.e., at time t„ + (1 — 77)71. 
This allows us to write t„ as 



r„ = inf{t > 1 : St = l}, 



where 



1) l<t<A, 



referred to as the "stopping rule at time t," is a binary random 
variable such that {St — 1} represents the set of output 
sequences 7/J_^„+i which make t„ stop at time t, assuming 
that Tn hasn't stopped before time t. 
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Now, every sequence j/''" e y" satisfies 

Therefore, any deterministic stopping rule stops at any par- 
ticular time either with probability zero or with probability 
at least e^"'^''", i.e., for all t, either the stopping rule St 
satisfies P{St = 1) > e^™^''" or it is trivial in the sense 
that P{St = 1) = 0. For now, we assume that the stopping 
rule is deterministic; the randomized case follows easily as we 
describe at the end of the proof. 

Let § denote the subset of indices t E {1, 2, . . . , A„/4} 
such that St is non-trivial, and let §k denote the subset of 
indices in S that are congruent to k mod rjn, i.e., 

§k = {t:te§,t = j-Tjn + k,j=Q,l,...}. 

Note that for each k, the set of stopping rules St, t E §k are 
independent since St depends only on yt*_^„+i. 

By repeating the same argument as in (l4ni-(l42Ti. for any 
e > 0, for all n large enough and any message index m the 
error probability Pm(£) satisfies 



e > P,n(£) 
> ^P*(t„ < A„/2). 
Since e > is arbitrary, we deduce 

P*{Tn > A„/2) > 1/2 



(76) 



(77) 



i.e., a coding scheme achieves a vanishing error probability 
only if the probability of stopping after time A„/2 is at least 
0.5 when the channel input is all ^'s. Thus, assuming that our 
coding scheme achieves vanishing error probability, we have 

|§| < T^ne"^''" . 

To see this, note that if |S| > rjne^^^, then there exists a 
value k* such that |Sfc.| > e™^''", and hence 

<P45t = 0, ieSfc.) 

— (I _ g-m2»?nySfc. I 
< {\ — g-mgtyn^e^si" 

Since the above last term tends to 1/e < 1/2 for n large 
enough, P*(t„ > A„/2) < 1/2 for n large enough, which is in 
conflict with the assumption that the coding scheme achieves 
vanishing error probability. 

The fact that |§| < yyne™^''" implies, as we shall prove 
later, that 

> A,j2\v < A„/4) > 3 f 1 - j • (78) 

Hence, 

E(t„ - 1^)+ > E((r„ - i/)+|t„ > A,J2, v < A„/4) 
X P(t„ > A„/2, < A„/4) 

An 



> -^P(t„ > A„/2|iy< A„/4) 
16 

- 32 V A^ 



where for the second inequaUty we used the fact that ly is 
uniformly distributed, and where the third inequality holds by 
jTSl ). Letting An = e"", from we deduce that if a > 
mrj, then E(t„ — i^)^ grows exponentially with n, implying 
that the rate is asymptotically zero 

E 

Hence a sliding window 
stopping rule which operates on a window of size rjn cannot 
accommodate a positive rate while achieving an asynchronism 
exponent larger than rjni. This establishes the desired result. 

We now show dTSl l. Let INf be the subset of indices in 
{1,2,..., An/ 4:} with the following property. For any t e INf, 
the 2n indices {t,t + 1, . . . ,t + 2n — 1} do not belong to S, 
i.e., all 2n of the associated stopping rules are trivial. Then 
we have 

P(t-„ > An/2\iy < An/4) > P(t„ > A„/2|i. G J^) 

X P{iy e 'Nliy < An/4) 

m 



'{Tn > An/2\V e N) 



An/4 



(80) 



since v is uniformly distributed. Using that |S| < ryne™^''", 
\^\ > (A„/4- 277^26™^""), 

hence from dSOl l 

P(t„ > An/2\,y < An/4) 

> P(t„ > A„/2|i. e 1 - 2^^^ ). (81) 



(79) 



An J 

Now, when v E "N, all stopping times that could potentially 
depend on the transmitted codeword symbols are actually 
trivial, so the event {t„ > An/2} is independent of the 
symbols sent at times i/jiy + 1, . . . ,iy + N — 1. Therefore, 

P(r„ > An/2\,y eJ^)^ P,(t„ > An/2). (82) 

Combining ( |82] | with dSTT i gives the desired claim (iTSl l. 

Finally, to see that randomized stopping rules also can- 
not achieve asynchronism exponents larger than 77m, note 
that a randomized stopping rule can be viewed as simply a 
probability distribution over deterministic stopping rules. The 
previous analysis shows that for any deterministic stopping 
rule, and any asynchronism exponent larger than r/m, either 
the probability of error is large (e.g., at least 1/8), or the 
expected delay is exponential in n. Therefore, the same holds 
for randomized stopping rules. ■ 

F. Comments on Error Criteria 

We end this section by commenting on maximum versus 
average rate/error probability criteria. The results in this paper 
consider the rate defined with respect to maximum (over mes- 
sages) reaction delay and consider maximum (over messages) 
error probability. Hence all the achievability results also hold 
when delay and error probability are averaged over messages . 

To see that the converse results in this paper also hold for 
the average case, we use the following standard expurgation ar- 
gument. Assume {(C„, (t„, (/>«))} is an (_R, a) coding scheme 

^"Any coding scheme that achieves vanishing en'or probability cannot have 
InAf grow faster than linearly with n, because of the limitation imposed 
by the capacity of the synchronous channel. Hence, if E(t„ — v)^ grows 
exponentially with n, the rate goes to zero exponentially with n. 
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where the error probability and the delay of (C„, (r„, 0„)) are 
defined as 

M 
m—1 

and 

1 ^ 

m— 1 

respectively. By definition of an {R, a) coding scheme, this 
means that given some arbitrarily small e > 0, and for all n 
large enough, 

< e 

and 



Hence, for n large enough and any S > I, one can find a 
(nonzero) constant fraction of codewords C„' C C„ (C„' is the 
"expurgated" ensemble) that satisfies the following property: 
the rate defined with respect to maximum (over C„') delay is at 
least {R—e)/6 and the maximum error probability is less than 
rjc, where 77 = ri{6) > 0. One then applies the converse results 
to the expurgated ensemble to derive bounds on {R/S, a), and 
thus on {R, a), since 5 > 1 can be chosen arbitrarily. 

VI. Concluding Remarks 

We analyzed a model for asynchronous communication 
which captures the situation when information is emitted 
infrequently. General upper and lower bounds on capacity 
were derived, which coincide in certain cases. The forms 
of these bounds are similar and have two parts: a mutual 
information part and a divergence part. The mutual information 
part is reminiscent of synchronous communication: to achieve 
a certain rate, there must be, on average, enough mutual 
information between the time information is sent and the time 
it is decoded. The divergence part is novel, and comes from 
asynchronism. Asynchronism introduces two additional error 
events that must be overcome by the decoder The first event 
happens when the noise produces a channel output that looks 
as if it was generated by a codeword. The larger the level 
of asynchronism, the more likely this event becomes. The 
second event happens when the channel behaves atypically, 
which results in the decoder missing the codeword. When this 
event happens, the rate penalty is huge, on the order of the 
asynchronism level. As such, the second event contributes to 
increased average reaction delay, or equivalently, lowers the 
rate. The divergence part in our upper and lower bounds on 
capacity strikes a balance between these two events. 

An important conclusion of our analysis is that, in general, 
training-based schemes are not optimal in the high rate, high 
asynchronism regime. In this regime, training-based architec- 
tures are unreliable, whereas it is still possible to achieve 
an arbitrarily low probability of error using strategies that 
combine synchronization with information transmission. 

Finally, we note that further analysis is possible when we 
restrict attention to a simpler slotted communication model 
in which the possible transmission slots are nonoverlapping 



and contiguous. In particular, for this more constrained model 
[?] develops a variety of results, among which is that except 
in somewhat pathological cases, training-based schemes are 
strictly suboptimal at all rates below the synchronous capacity. 
Additionally, the performance gap is quantified for the special 
cases of the binary symmetric and additive white Gaussian 
noise channels, where it is seen to be significant in the high 
rate regime but vanish in the limit of low rates. Whether the 
characteristics observed for the slotted model are also shared 
by unslotted models remains to be determined, and is a natural 
direction for future research. 
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Appendix A 
Proof of Remark|2](p.[T4]) 

To show that the random coding scheme proposed in the 
proof of Theorem |2] achieves (|6) with equality, we show that 

a< max min max{D(V\\(PQ)^>), D(V\\Q^,)}. 

(83) 

Recall that, by symmetry of the encoding and decoding 
procedures, the average reaction delay is the same for any 
message. Hence 

where Ei denotes expectation under the proability measure 
Pi, the channel output distribution when message 1 is sent, 
averaged over time and codebooks. 
Suppose for the moment that 

Ei(t„ - > n(l - 0(1)) n^oo. (84) 

It then follows from Fano's inequality that the input distribu- 
tion P must satisfy I{PQ) > R. Hence, to establish ( l83b we 
will show that at least one of the following inequalities 

D{V\\{PQh)>a 

D{V\\Q,) > a (85) 

holds for any V E 7^ . The arguments are similar to those 
used to estabUsh Claim |3] of Theorem |3] Below we provide 
the key steps. 

We proceed by contradiction and show that if both the 
inequalities in (l85l l are reversed, then the asymptotic rate is 
zero. To that aim we provide a lower bound on Ei(t„ — 

Let t/j denote the time of the beginning of the decoding 
window, i.e., the first time when the previous n output symbols 
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have empirical distribution P such that D{P\\Qt,) > a. By where we defined 
definition, r„ > t^, so 



L(T„-t^)+ >Ei(T,;-t.) 
1 

3 



^ A/3 and where the last inequality follows from the fact that 

> ^^^IPi,t(T^ > 2A„/3), (86) Pi(t„ < 1/ + i) is a non-decreasing function of i. Since 
t=i g{n) = n{l — o(l)), to establish (|84] | it suffices to show that 



Mtu < + gin)) ^ o{l) (n^cx)). (90) 



where the second inequality follows from Markov's inequality, 
and where Pi,t denotes the probability measure at the output 
of the channel conditioned on the event that message 1 starts 
being sent at time t, and averaged over codebooks. Note that. Since 

because is not a function of the codebook, there is no Pi(Tn < = o(l) {n -> oo) , 

averaging on the stopping timesF*! 

Fix V e 7y. We lower bound each term Pi,t(T,'j > 2A„/S) as follows from computation steps in ( |22] | and (l23T l. to establish 
in the above sum as (l90l l it suffices to show that 

Pi,t« > 2yl„/3) Fiiiy < Tn < ly + gin)) = o{l) in ^ oo) . (91) 

> Pi,t(T; > 2A„/3|y/+"-i e v)Pi,t(y/+"-i g Jv) 

> Pi,t(r„ > 2A„/3|y/+"-i e 'Jy)e-"-°i poly(n), (87) For i G {0, 1, . . . ,.9(n)} we have 



where Di = _D(F||(PQ)y), and where the second inequality Pi(t„ = + i 
follows from Fact |2] 

The key change of measure step (ITtT i results now in the 

equality = ^ (^C"(i),y:+L„^, = (92) 

Pi,t« > 2A„/3|y/+"-i e 7v) ' 

= P*(r' > 2A,i/3|y/^"^^ e Ty) (88) where the above summation is over all typical joint types, i.e., 

* " ~ " * ' all J e T^'^ such that 
which can easily be checked by noticing that the probability 

of any sequence in Ty is the same under Pi^t- \-Pc"{i) iO'ib) — Jia,b)\ < fi (93) 
Substituting (ISST i into the right-hand side of dSTl i. and using 

dHSJ and Fact|2] we get for all (a, b)eXx)j. 

^ , „in n \ We upper bound each term in this summation. First observe 

-4/3 

X > 2A„/3, r/+"-i e Ty), (89) {-Pc"(i),y;+'_„^, = -^i ' 

^ for i e {0, 1, . . . , (7(n)}, involves random vector Y^J^^^_j^,^ 

where = DiV\\Q,). The rest of the proof consists in ^j^^^j^ ^^^^^ generated by noise and partly generated by 

showmg that If the two mequalities m m are reversed, then ^j^^ transmitted codeword corresponding to message 1. In the 

the right-hand side of the above inequality grows exponentially foUo^jng computation k refers to first symbols of r;+/_„ , ^ 

with n, which results in an asymptotic rate equal to zero, ^j^^^j^ generated by noise, i.e., by definition k = n- 

The arguments closely parallel the ones that prove Claim |3]of ^^^^ ^j^^^ ^-^^^ g < ^ < ^^^^^ j^^^^ 
Theorem [3] (see from (l38t onwards), and hence are omitted. 



To conclude the proof we show (|84] |. Using the alternate [n^^"'] — l<fc<n— 1 

form of expectation for non-negative random variables E,X — 
Efc>o > we have We have 



(-Pc"(i),i'r+;_„^-, - 



Ei(r„-i^)+>^Pi(r„>i^ + fc) 

1=1 



> ^(1-Pi(t„ < ^. + ^)) 

1=1 

>g(n)(l-Pi(T„ <i/ + g(n)))^ 



^'For different codebook realizations, stopping rule is the same, by 
contrast with t„ which depends on the codebook via the joint typicality 
criterion of the second phase. 



kJ-i + {n—k)J2—nJ 
/ 

X ^ P(x"-^y"-'=) I , (94) 
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where we used the following shorthand notations for proba- 
bilities 



k 
k 

^l[P{x,)Q{y,\x,). 



Further, using Fact |2] 



J2 ^(•^')P.(y') 



x>':P^k=Ji,x y''-Pyk=Ji,y 

< g-fe(D(Ji,x||P)+£'(./i,villQ»)) 

< g-fcD(,/i,a||QO (.95^ 

where Ji.x and Ji y denote the left and right marginals of 
J, respectively, and where the second inequality follows by 
non-negativity of divergence. 
A similar calculation yields 

< g-(n-fe)D(,/2||PQ) 



(96) 



From (mj, (|95ll, (|96ll and Fact [T] we get 



^i(-Pc"(i),yr+/_„+i - ^) 
< poly(n) 



exp 



kJi+{n—k)J2—nJ 
fc:rn^/*]-l<fc<n-l 



fcp(Jl,y||gO) 



(n-fc)Z?(J2||Fg) 



(97) 



The maximum on the right-hand side of (|97| i is equal to 



max 

kJi+{n—k)J2—nJy 
fe:rn^''*l-l<fe<Ti-l 



exp 



kD{Ji\\Q,) 



(n^k)D(J2\\{PQh) 



(98) 



We upper bound the argument of the above exponential via 
the log-sum inequality to get 

- kD{Ji\\Q,) - {n-k)D{J2\\{PQh) 
<-nD{j^\\5Q. + {l-6){PQ)^), (99) 



where 5 = k/n. Using d99T l, we upper-bound expression d98T l 
by 



max 

5:„-l/4_„-l<5<i 



exp -nD{j^\\5Q, + {l-5){PQ)^) 



< 



s 

< exp 



max exp [— riilfj^)] 

„-l/4_„-l<5<l '- 



(100) 



where for the first inequality we used Pinsker's inequality Il7l 
Problem 17 p. 58] 

1 



D{Pi\\P2)> 



2 In 2 



and assume that /i is small enough and n is large enough for 
this inequality to be valid. Such /i and n exist whenever the 
distributions Q^, and {PQ)y are different. 
It then follows from that 



J) < exp -n{n^/^) 



hence, from (|92] i and Fact [T] we get 

Pi(t„ = + i) < exp -r2(n^/^) 

for i E {0,1, g{n)}. Finally a union bound over times 
yields the desired result (|90t since g{n) — 0{n). 

Appendix B 
Proof of Theorem[5] 

The desired Theorem is a stronger version of [Z, Corol- 
lary 1.9, p. 107], and its proof closely follows the proof of the 
latter 

Before proceeding, we recall the definitions of ?7-image and 
/-neighborhood of a set of sequences. 

Definition 4 (rj-image, [7]Definition 2.1.2 p. 101): A set 
■B C T is an r;-image of a set yi C X" if > 77 for 

all X € A. The minimum cardinality of ry-images of A is 
denoted ggiA, rf). 

Definition 5 (l-neighborhood, ^T]/ p. 86): The l- 
neighborhood of a set 23 C y" is the set 

where d/f ({?/"}, 23) denotes the Hamming distance between 
and 23, i.e., 

rfff({2;"},23)= minrfff(2;",r). 

As other notation, for a given conditional probability Q{y\x), 
{x, y) G X X y, and € X", we define the set 



Q](2^") = {y"eT: 

\P.^^y^{a,b)-P.,,.{a)Q{b\a)\ < ^, V(a,6)GXxy} 

where q e (0, 1/2). To establish Theorem |5] we make use of 
the following three lemmas. Since we restrict attention to block 
coding schemes, i.e., coding scheme whose decoding happens 
at the fixed time n, we denote them simply by (C„, 0„) instead 

of (e„, (7„, (/)„)). 
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In the following, e„ is always given by 

e„ = (n + l)l^"l-l^lexp(-n2V(21n2)). 

Lemma 2: Given 7 e (0,1), Q G T^l^, P G 7^, and 
cA C Tp, there exist (C„,0„) for each n > no{-f,q, |X|, |y|) 
such that 

1) c"(m) e yi, for all c"(m) G e„ 

2) (^) C TfQj (c"(m)), m e {1, 2, . . . , M} 

3) the maximum error probability is upper bounded by 2e„ 

4) the rate satisfies 

-ln|e„| > -lngQiA,en)^HiQ\P)-j. 
n n 

Proof of Lemma^ The proof closely follows the proof of 
Q Lemma 1.3, p. 101] since it essentially suffices to replace 
e and 7 in the proof of 17] Lemma 1.3, p. 101] with 2e„ and 
e„, respectively. We therefore omit the details here. 
One of the steps of the proof consists in showing that 



Q(TPq](x")|2;")>1 



(101) 



for all x" e X". To establish this, one proceeds as follow. 
Given P e 7^ D denote the set of empirical conditional 
distributions W{y\x) G T^'^ such that 

\P,,^ia)W{b\a) - P,,.{a)Qib\a)\ < ^ 

for all (a, 6) e X X y. We have 



< 



E 



-7iD(W\\Q\P) 



(102) 
(103) 
(104) 



< (n + l)l^l'l^lexp(-n min D(W\\Q\P)) 

weV" 

< (n + 1)1^1-1^1 exp(-n min \\PW - POf /2\n2) 

(105) 

< (n + 1)1^1-1^1 exp(-n2721n2) (106) 



which shows ( llOlb . Inequality ( 1103b follows from Fact[3l (1104b 
follows from Fact [T] (1105b follows from Pinsker's inequality 
(see, e.g., ||2J Problem 17, p. 58]), and ( 1106b follows from the 
definition of D. ■ 
Lemma 5 f Lemma 1.4, p. 104]): For every 6,7 G 
(0, 1), if (C„, (pn) achieves an error probability e and C„ C Tp, 
then 

-ln|e„| < -ln.gQ(e„,e + 7)-H(g|P)+7 
n n 

whenever n > no(|X|, |y |, 7). 

Since this lemma is established in {T, Lemma 1.4, p. 104], we 
omit its proof. 

Lemma 4: For every 7 > 0, e G (0,1), Q G T^l^, and 
A C X" 



1 



1 



whenever n > 710(7,9, |X|, |y|). 

Proof of Lemma ^ By the Blowing Up Lemma (7] 
Lemma 1.5.4, p. 92] and [7, Lemma 1.5.1, p. 86], given 
the sequence {e„}„>i, there exist {Z„} and {?7„} such that 
In/n " — V and rjn " — 1, and such that the following two 
properties hold. 

For any 7 > and n > ^0(7, q, |X|, 1^1) 

-ln|r'"S| - -InlSI < 7 for every S C y", (107) 

71 71 

and for all G X", 

g(r''"B|a;") > 77„ whenever Q('B|x") > e„. (108) 

Now, assuming that S is an e„-image of A with |23| = 
ggiA, e„), the relation (1108b means that r'"® is an 77„-image 
of A. Therefore we get 



-\ngQ{A,T]„ 

71 



1 



< - In |r'' 

71 

< 7 + - ln|®| 

71 

= 7+^ln.gQ(yi,en) (109) 

where the second inequality follows from (1107b . Finally, since 
rjn I and e„ — > as 71 — > 00, for 71 large enough we have 

9Q{-A,e) < gQ{A,'qn) and e„ < e, 
and therefore from (1109b we get 

-\ngQ{A,€) < - \ngQ{A,€n) < 7 + -ln5Q(-^,e) 

71 71 71 

yielding the desired result. ■ 
We now use these lemmas to establish Theorem |5] Choose 
e, 7 > such that e + 7 < /. Let (C„, be a coding scheme 
that achieves maximum error probability e. Without loss of 
generality, we assume that C„ C Tp (If not, group codewords 
into families of common type. The largest family of codewords 
has error probability no larger than e, and its rate is essentially 
the same as the rate of the original code C„.) Therefore 



SI 



-ln|e„| < iln5Q(e„,e- 

71 71 

< - IngQiGnJ) 
n 

< -ln.gQ(e„,e„ 

71 



-f)-H{Q\P) + 
-H{Q\P)+j 
-H{Q\P) + 2j 



7 



(110) 



for 71 > 710(7, ^ 1^1: where the first and third inequalities 
follow from Lemmas [3] and |4] respectively, and where the 
second inequality follows since gQ{C„,e) is nondecreasing 
in e. On the other hand, by Lemma |2] there exists a coding 
scheme (CJ^, 0^), with C'„ C C„ that achieves a probability of 
error upper bounded by 2e„ and such that its rate satisfies 



i In |e:j > - ln5Q(e„, e„) - i7(g|F) - 7 
71 n 



(111) 



for n > 710(7, g, I^L I^D- From dl 10b and (II 1 lb we deduce 
the rate of is lower bounded as 

-ln|e:j> -ln|e„|~37 
71 n 



ln5Q(-^,e) ln.gQ(^,e„) <7 



whenever 71 > 710(7,^,5, 
result. 



|,|y|). This yields the desired 
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Abstract — Several aspects of the problem of asynchronous 
point-to-point communication without feedback are developed 
Vfhen the source is highly intermittent. In the system model of 
interest, the codevcord is transmitted at a random time within 
a prescribed window whose length corresponds to the level of 
asynchronism between the transmitter and the receiver. The 
decoder operates sequentially and communication rate is defined 
as the ratio between the message size and the elapsed time 
between when transmission commences and when the decoder 
makes a decision. 

For such systems, general upper and lower bounds on capacity 
as a function of the level of asynchronism are established, 
and are shown to coincide in some nontrivial cases. From 
these bounds, several properties of this asynchronous capacity 
are derived. In addition, the performance of training-based 
schemes is investigated. It is shown that such schemes, which 
implement synchronization and information transmission on 
separate degrees of freedom in the encoding, cannot achieve 
the asynchronous capacity in general, and that the penalty is 
particularly significant in the high-rate regime. 

Index Terms — asynchronous communication; bursty commu- 
nication; error exponents; sequential decoding; sparse commu- 
nication; synchronization 

I. Introduction 

INFORMATION-THEORETIC analysis of communication 
systems frequently ignores synchronization issues. In many 
applications where large amounts of data are to be transmitted, 
such simplifications may be justified. Simply prepending a 
suitable synchronization preamble to the initial data incurs 
negligible overhead yet ensures that the transmitter and the 
receiver are synchronized. In turn, various coding techniques 
(e.g., graph based codes, polar codes) may guarantee delay 
optimal communication for data transmission in the sense that 
they can achieve the capacity of the synchronous channel. 

In quantifying the impact due to a lack of synchronization 
between a transmitter and a receiver, it is important to note 
that asynchronism is a relative notion that depends on the size 
of the data to be transmitted. For instance, in the above "low 
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asynchronism" setting it is impUcitly assumed that the data is 
large with respect to the timing uncertainty. 

In a growing number of applications, such as many involv- 
ing sensor networks, data is transmitted in a bursty manner 
An example would be a sensor in a monitoring system. By 
contrast with the previous setting, here timing uncertainty is 
large with respect to the data to be transmitted. 

To communicate in such "high asynchronism" regimes, one 
can use the traditional preamble based communication scheme 
for each block. Alternatively, one can pursue a fundamentally 
different strategy in which synchronization is integrated into 
the encoding of the data, rather than separated from it. 

To evaluate the relative merits of such diverse strategies, and 
more generally to explore fundamental performance limits, we 
recently introduced a general information-theoretic model for 
asynchronous communication in [3]. This model extends Shan- 
non's original communication model [4] to include asynchro- 
nism. In this model, the message is encoded into a codeword 
of fixed length, and this codeword starts being sent across a 
discrete memoryless channel at a time instant that is randomly 
and uniformly distributed over some predefined transmission 
window. The size of this window is known to transmitter 
and receiver, and the level of asynchronism in the system 
is governed by the size of the window with respect to the 
codeword length. Outside the information transmission period, 
whose duration equals the codeword length, the transmitter 
remains idle and the receiver observes noise, i.e., random 
output symbols. The receiver uses a sequential decoder whose 
scope is twofold: decide when to decode and what message 
to declare. 

The performance measure is the communication rate which 
is defined as the ratio between the message size and the 
average delay between when transmission starts and when the 
message is decoded. Capacity is the supremum of achievable 
rates, i.e., rates for which vanishing error probability can be 
guaranteed in the limit of long codeword length. 

The scahng between the transmission window and the 
codeword length that meaningfully quantifies the level of 
asynchronism in the system turns out to be exponential, i.e., 
A = e"" where A denotes the size of the transmission 
window, where n denotes the codeword length, and where 
a denotes the asynchronism exponent. Indeed, as discussed 
in [3], if A scales subexponentially in n, then asynchronism 
doesn't impact communication: the asynchronous capacity 
is equal to the capacity of the synchronous channel. By 
contrast, if the window size scales superexponentially, then 
the asynchrony is generally catastrophic. Hence, exponential 
asynchronism is the interesting regime and we aim to compute 
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capacity as a function of the asynchronism exponent. 

For further motivation and background on the model, includ- 
ing a summary of related models (e.g., the insertion, deletion, 
and substitution channel model, and the detection and isolation 
model) we refer to [3, Section II]. Accordingly, we omit such 
material from the present paper. 

The first main result in [3] is the characterization of the 
synchronization threshold, which is defined as the largest asyn- 
chronism exponent for which it is stiU possible to guarantee 
reliable communication — this result is recalled in Theorem 1 
of Section IV. 

The second main result in [3] (see [3, Theorem 1]) is a lower 
bound to capacity. A main consequence of this bound is that 
for any rate below the capacity of the synchronous channel it 
is possible to accommodate a non-trivial asynchronism level, 
i.e., a positive asynchronism exponent. 

While this work focuses on rate, an alternative performance 
metric is the minimum energy (or, more generally, the min- 
imum cost) needed to transmit one bit of information asyn- 
chronously. For this metric, [5], [6] estabUshes the capacity 
per unit cost for the above bursty communication setup. 

We now provide a brief summary of the results contained 
in this paper: 

• General capacity lower bound. Theorems 2 and 1. The- 
orem 2 provides a lower bound to capacity which is 
obtained by considering a coding scheme that performs 
synchronization and information transmission jointly. The 
derived bound results in a much simpler and often much 
better lower bound than the one obtained in [3, Theorem 
1]. Theorem 2, which holds for arbitrary discrete memo- 
ryless channels, also holds for a natural Gaussian setting, 
which yields Theorem 1. 

• General capacity upper bound. Theorem 3. This bound 
and the above lower bound, although not tight in general, 
provide interesting and surprising insights into the asyn- 
chronous capacity. For instance. Corollary 2 says that, in 
general, it is possible to reliably achieve a communication 
rate equal to the capacity of the synchronous channel 
while operating at a strictly positive asynchronism expo- 
nent. In other words, it is possible to accommodate both 
a high rate and an exponential asynchronism. 
Another insight is provided by Corollary 3, which relates 
to the very low rate communication regime. This result 
says that, in general, one needs to (sometimes signifi- 
cantly) back off from the synchronization threshold in 
order to be able to accommodate a positive rate. As a 
consequence, capacity as a function of the asynchronism 
exponent does not, in general, strictly increase as the 
latter decreases. 

• Capacity for channels with infinite synchronization 
threshold. Theorem 4. For the class of channels for 
which there exists a particular channel input which can't 
be confused with noise, a closed-form expression for 
capacity is established. 

• Suboptimality of training based schemes. Theorem 6, 
Corollaries 4 and 5. These results show that commu- 
nication strategies that separate synchronization from 
information transmission do not achieve the asynchronous 



capacity in general. 
• Good synchronous codes, Theorem 5. This result may 
be independent interest and relates to synchronous com- 
munication. It says that any codebook that achieves a 
nontrivial error probability contains a large subcodebook, 
whose rate is almost the same as the rate of the original 
codebook, and whose error probability decays exponen- 
tially with the blocklength with a suitable decoder. This 
result, which is a byproduct of our analysis, is a stronger 
version of [7, Corollary 1.9, p. 107] and its proof amounts 
to a tightening of some of the arguments in the proof of 
the latter. 

It is worth noting that most of our proof techniques differ 
in some significant respects from more traditional capacity 
analysis for synchronous communication — ^for example, we 
make little use of Fano's inequality for converse arguments. 
The reason for this is that there are decoding error events 
specific to asynchronous communication. One such event is 
when the decoder, unaware of the information transmission 
time, declares a message before transmission even starts. 

An outline of the paper is as follows. Section 11 sunmiarizes 
some notational conventions and standard results we make use 
of throughout the paper. Section III describes the conmiunica- 
tion model of interest. Section IV contains our main results, 
and Section V is devoted to the proofs. Section VI contains 
some concluding remarks. 

II. Notation and Preliminaries 

In general, we reserve upper case letters for random vari- 
ables (e.g., X) and lower case letters to denote their cor- 
responding sample values (e.g., x), though as is customary, 
we make a variety of exceptions. Any potential confusion is 
generally avoided by context. In addition, we use to denote 
the sequence Xi,Xi+i, . . . ,Xj, for i < j. Moreover, when 
i = 1 we use the usual simpler notation .x" as an alternative 
to x". Additionally, = denotes "equality by definition." 

Events (e.g., £) and sets (e.g., §) are denoted using cali- 
graphic fonts, and if £ represents an event, denotes its 
complement. As additional notation, P[-] and ¥,[■] denote the 
probability and expectation of their arguments, respectively, 
II • II denotes the Li norm of its argument, | • | denotes 
absolute value if its argument is numeric, or cardinality if its 
argument is a set, [-J denotes the integer part of its argument, 
a Ab = min{a, 6}, and x~^ = max{0,a;}. Furthermore, we 
use C to denote nonstrict set inclusion, and use the Kronecker 
notation 1 (A) for the function that takes value one if the event 
A is true and zero otherwise. 

We also make use of some familiar order notation for 
asymptotics (see, e.g., [8, Chapter 3]). We use o(-) and uj{-) 
to denote (positive or negative) quantities that grow strictly 
slower and strictly faster, respectively, than their arguments; 
e.g., o(l) denotes a vanishing term and n/ Inn = uj{y/n). We 
also use O(-) and ri(-), defined analogously to o(-) and w(-), 
respectively, but without the strictness constraint. Finally, we 
use poly( ) to denote a function that does not grow or decay 
faster than polynomially in its argument. 

We use P(-) to denote the probability of its argument, and 
use CP-*", T^, and J'-'"'^ to denote the set of distributions over 
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the finite alphabets X, y, and X x y respectively, and use 
f^l^ to denote the set of conditional distributions of the form 
V{y\x) for (x, y) e X x y. 

For a memoryless channel characterized by channel law 
Q e the probabiUty of the output sequence ?/" e 

given an input sequence a;" e X" is 

n 
i=l 

Throughout the paper, Q always refers to the underlying 
channel and C denotes its synchronous capacity. 

Additionally, we use Jx and Jy to denote the left and right 
marginals, respectively, of the joint distribution J G 7-^'^, i.e., 

Jxix) = ^J{x,y) and Jy{y) = ^ J{x,y). 

We define all information measures relative to the natural 
logarithm. Thus, the entropy associated with P g CP-'" is^ 

ff(P)4-^P(a;)lnP(a;), 

and the conditional entropy associated with Q e J*^!^ and 
P e is 

H{Q\p) 4 - ^ P{x) J2 Qiy\^) In Qiy\^)- 

Similarly, the mutual information induced by J(-,-) e CP""''^ 
is 



I{J)^ J2 J{x,y)ln 



J{x,y) 



so 



7(PQ)^^P(x)^g(y|x)ln 



^x(a;)Jy(y)' 



xex j/ey 



for P G T-^ and W G T^''*^. Furthermore, the information 
divergence (KuUback-Leibler distance) between Pi G J'"'" and 
P2 G is 



Z)(Pi||P2)^X]Pi(x)ln 



P2(a;)^ 



and conditional information divergence is denoted using 

xex j/ey 
^ D{PWi\\PW2), 

where P G T'^ and Wi,W2 G J'^l-'^. As a specialized notation, 
we use 

£>B(ei||e2) = eiln (^^^ + (1 - ei)ln (^1^ 

to denote the divergence between Bernoulli distributions with 
parameters ei,e2 G [0, 1]. 

the definition of all such information measures, we use the usual 
convention 01n(0/0) = 0. 



We make frequent use of the method of types [7, Chap- 
ter 1.2]. In particular, P^r. denotes the empirical distribution 
(or type) of a sequence x" G X", i.e.,^ 



n 

PMx) = -J2Hxi=x). 



i=l 



The joint empirical distribution P^x'^^yn) for a sequence pair 
(a;",y") is defined analogously, i.e., 

1 " 

Px^,v^{x,y) = - ^l{xi = x,yi = y), 
i=i 

and, in turn, a sequence y" is said to have a conditional 
empirical distribution Pyn\^x"- € f^Y^ given a;" if for all 

(x,y)GXxy, 

Px",y^{x,y) = Px^{x) Pyr^\x'^{y\x). 

As additional notation, P G J'"'" is said to be an n-type 
if nP{x) is an integer for all x G X. The set of all n-types 
over an alphabet X is denoted using J"^. The n-type class 
of P, denoted using Tp, is the set of all sequences x" that 
have type P, i.e., such that P^n = P. A set of sequences is 
said to have constant composition if they belong to the same 
type class. When clear from the context, we sometimes omit 
the superscript n and simply write Tp. For distributions on 
the alphabet X x ^ the set of joint n-types J^'^ is defined 
analogously. The set of sequences that have a conditional 
type W given x" is denoted by Iwix"^), and denotes 
the set of empirical conditional distributions, i.e., the set of 
W G such that W = Pyn|^„(i/|x) for some (x",y") G 
X" x T. 

Finally, the following three standard type results are often 
used in our analysis. 

Fact 1 ([7, Lemma 1.2.2]): 

|y^|<(n+ 1)1^1 
|j)X.y|<(„ + l)|X|.|y| 

|yy|X|<(„ + i)|X|.|y|_ 

Fact 2 ([7, Lemma L2.6]): If X" is independent and iden- 
tically distributed (i.i.d.) according to Pi G 7^, then 

L^e-nfl(P2||Pi) < p(^n g ^ ) < g-nZ3(P.||Pi). 

(n -I- 1)1^1 ~ ^ ^~ 

for any P2 eT^. 

Fact 3 ([7, Lemma L2.6]): If the input x" G X" to a 
memoryless channel Q G 7^^^ has type P G CP-'^, then the 
probability of observing a channel output sequence which 
lies in Tm/(x") satisfies 

^-nDiW\\Q\P) < p(yn g J^(a;n)|a;n) 



(n + l)l^l|y| 



for any g T^l-'^ such that Tvy(a;") is non-empty. 

^When the sequence that induces the empirical type is clear from context, 
we omit the subscript and write simply P. 



III. Model and Performance Criterion 

The asynchronous communication model of interest cap- 
tures the setting where infrequent delay-sensitive data must be 
reliably communicated. For a discussion of this model and its 
connections with related communication and statistical models 
we refer to [3, Section II]. 

We consider discrete-time communication without feedback 
over a discrete memoryless channel characterized by its finite 
input and output alphabets X and y, respectively, and transition 
probability matrix Q{y\x), for all j/ e V and x e X. Without 
loss of generality, we assume that for all j/ S y there is some 
a; G X for which Q{y\x) > 0. 

There are M > 2 messages m e {1, 2, . . . , M}. For each 
message m, there is an associated codeword 

c"(m) = ci(m) C2(m) • • • c„(m), 

which is a string of n symbols drawn from X. The M 
codewords form a codebook e„ (whence |Cn| = M). Com- 
munication takes place as follows. The transmitter selects a 
message m randomly and uniformly over the message set and 
starts sending the corresponding codeword c"(to) at a random 
time V, unknown to the receiver, independent of c"(m), and 
uniformly distributed over {1, 2, . . . , A}, where A = e"" is 
referred to as the asynchronism level of the channel, with a 
termed the associated asynchronism exponent. The transmitter 
and the receiver know the integer parameter A > 1. The 
special case ^ = 1 (i.e., a = 0) corresponds to the classical 
synchronous communication scenario. 

When a codeword is transmitted, a noise-corrupted version 
of the codeword is obtained at the receiver. When the transmit- 
ter is silent, the receiver observes only noise. To characterize 
the output distribution when no input is provided to the 
channel, we make use of a specially designated "no-input" 
symbol * in the input alphabet X, as depicted in Figs. 1 and 2. 
Specifically, 

Q* = g(-w (1) 

characterizes the noise distribution of the channel. Hence, 
conditioned on the value of u and on the message m 
to be conveyed, the receiver observes independent symbols 
Yi, I2, • • • , Ya+u-i distributed as follows. If 

te{l,2,...,z.-l} 

or 

[v + n,v + n + l, . . . ,A + n 

the distribution of Yt is Qi,. If 

t & {v.v + I, . . . ,v + n — 1} , 

the distribution of Y* is Q[-\ct-u+i{m)). Note that since the 
transmitter can choose to be silent for arbitrary portions of its 
length-r7. transmission as part of its message-encoding strategy, 
the symbol ★ is eligible for use in the codebook design. 

The decoder takes the form of a sequential test (r, (j)), where 
T is a stopping time, bounded by A + n — 1, with respect 
to the output sequence Fi, F21 • • • > indicating when decoding 
happens, and where (j) denotes a decision rule that declares 
the decoded message; see Fig. 2. Recall that a stopping time 



Q(-l-) 




Fig. 1. Graphical depiction of the transmission matrix for an asynchronous 
discrete memoryless chaimel. The "no input" symbol * is used to characterize 
the chaimel output when the transmitter is silent. 

V 



★ ★ ••• ★oi(m) ... cn(.m)-k -k ■■■ * 

T 

I \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 



Fig. 2. Temporal representation of the channel input sequence (upper axis) 
and channel output sequence (lower axis). At time v message m starts being 
sent and decoding occurs at time r. Since v is unknown at the receiver, 
the decoding time may be before the entire codeword has been received, 
potentially (but not necessarily) resulting in a decoding error. 



T (deterministic or randomized) is an integer-valued random 
variable with respect to a sequence of random variables 
so that the event {r = i), conditioned on {1^}*^^, 
is independent of \Yi\°^^j^^, for all i > 1. The function 
(/) is then defined as any Ji-measurable map taking values 
in {1,2, ...,M}, where 5'i,5'2,--- is the natural filtration 
induced by the process Yi, I2. • • • ■ 

A code is an encoder/decoder pair (6, (t, 

The performance of a code operating over an asynchronous 
channel is quantified as follows. First, we define the maximum 
(over messages), time-averaged decoding error probability'* 

1 ^ 

P(£)=max- VP„,t(£), (2) 

m A ^— ' 

where £ indicates the event that the decoded message does not 
correspond to the sent message, and where the subscripts m, t 
indicate the conditioning on the event that message m starts 
being sent at time v = t. Note that by definition we have 

Pm,t(£)=Pm,t(.^(l^")7^m). 

Second, we define communication rate with respect to the 

average elapsed time between the time the codeword starts 
being sent and the time the decoder makes a decision, i.e.. 



where 

1 ^ 

A = max- VE„,t(r-t)+, (4) 

'Note that the proposed asynchronous discrete-time communication model 
still assumes some degree of synchronization since transmitter and receiver 
are supposed to have access to clocks ticking at unison. This is sometimes 
referred to as frame asynchronous symbol synchronous communication. 

■^Note that there is a small abuse of notation as P(£) need not be a 
probabihty. 
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where a;+ denotes max{0,a;}, and where ¥,m,t denotes the 
expectation with respect to Pm,t-^ 

With these definitions, the class of communication strategies 
of interest is as follows. 

Definition 1 ({R,a) Coding Scheme): A pair {R,a) with 
i? > and a > is achievable if there exists a sequence 
{(C„, (r„, (l)n)}n>i of codes, indexed by the codebook length 
n, that asymptotically achieves a rate -R at an asynchronism 
exponent a. This means that for any e > and every n large 
enough, the code (e„, (t„, 0„)) 

1) operates under asynchronism level An = e^""^)"; 

2) yields a rate at least equal to i? — e; 

3) achieves a maximum error probability of at most e. 
An {R,a) coding scheme is a sequence {(C„, (t„, 0„))}„>i 
that achieves the rate-exponent pair {R, a). 

In tum, capacity for our model is defined as follows. 

Definition 2 (Asynchronous Capacity): For given a > 0, 
the asynchronous capacity R{a) is the supremum of the set 
of rates that are achievable at asynchronism exponent a. 
Equivalently, the asynchronous capacity is characterized by 
a{R), defined as the supremum of the set of asynchronism 
exponents that are achievable at rate -R > 0. 
Accordingly, we use the term "asynchronous capacity" to 
designate either R{a) or a{R). While R{q) may have the 
more natural immediate interpretation, most of our results are 
more conveniently expressed in terms of a{R). 

In agreement with our notational convention, the capacity of 
the synchronous channel, which corresponds to the case where 
a = 0, is simply denoted by C instead of i?(0). Throughout 
the paper we only consider channels with C > 0. 

Remark 1: One could alternatively consider the rate with 
respect to the duration the transmitter occupies the channel 
and define it with respect to the block length n. In this case 
capacity is a special case of the general asynchronous capacity 
per unit cost result [5, Theorem 1]. 

In [3], [9] it is shown that reliable communication is possible 
if and only if the asynchronism exponent a does not exceed 
a limit referred to as the "synchronization threshold." 

Theorem 1 ( [3, Theorem 2], [9]): If the asynchronism ex- 
ponent is strictly smaller than the synchronization threshold 

cto = max£)(Q(-|a;)||Q*) = a{R = 0), 

X 

then there exists a coding scheme {(£„, (t„, (/>„))}„>i that 
achieves a maximum error probability tending to zero as n — > 
oo. 

Conversely, any coding scheme {(C„, (r„, (^„))}„>i that 
operates at an asynchronism exponent strictly greater than the 
synchronization threshold, achieves (as n oo) a maximum 
probability of error equal to one. 

Moreover,* 

ao>0 if and only if C > . 

A few comments are in order The cause of unreliable 
communication above the synchronization threshold is the 
following. When asynchronism is so large, with probability 

'Note that Em.,tiTn — t)"*" should be interpreted as Em,t((rn — *)"*")■ 
*This claun appeared in [3, p. 4515]. 



approaching one pure noise mimics a codeword for any 
codebook (regardless of the rate) before the actual codeword 
even starts being sent.^ This results in an error probabiUty of 
at least 1/2 since, by our model assumption, the message set 
contains at least two messages. On the other hand, below the 
synchronization threshold reliable conomunication is possible. 
If the codebook is properly chosen, the noise won't mimic any 
codeword with probabihty tending to one, which allows the 
decoder to reUably detect the sent message. 
Note that 

tto = oo 

if and only if pure noise can't generate all channel outputs, 
i.e., if and only if (5*(y) = for some y E y. Indeed, in this 
case it is possible to avoid the previously mentioned decod- 
ing confusion by designing codewords (partly) composed of 
symbols that generate channel outputs which are impossible 
to generate with pure noise. 

The last claim in Theorem 1 says that reliable asynchronous 
communication is possible if and only if reliable synchronous 
communication is possible. That the former implies the latter 
is obvious since asynchronism can only hurt communication. 
That the latter implies the former is perhaps less obvious, and a 
high-level justification is as follows. When C > 0, at least two 
channel inputs yield different conditional output distributions, 
for otherwise the input-output mutual information is zero re- 
gardless of the input distribution. Hence, Q{-\*) ^ (3(-|a;) for 
some X ^-k. Now, by designing codewords mainly composed 
of X it is possible to reliably signal the codeword's location to 
the decoder even under an exponential asynchronism, since the 
channel outputs look statistically different than noise during 
the message transmission. Moreover, if the message set is 
small enough, it is possible to guarantee reliable message 
location and successfully identify which message from the 
message set was sent. Therefore, exponential asynchronism 
can be accommodated, hence qiq > 0. 

Finally, it should be pointed out that in [3] all the results 
are stated with respect to average (over messages) delay and 
error probability in place of maximum (over messages) delay 
and error probabihty as in this paper. Nevertheless, the same 
results hold in the latter case as discussed briefly later at the 
end of Section V. 

IV. Main Results 

This section is divided into two parts. In Section IV-A, 
we provide general upper and lower bounds on capacity, 
and derive several of its properties. In Section IV-B, we 
investigate the performance Umits of training-based schemes 
and establish their suboptimality in a certain communication 
regime. Since both sections can be read independently, the 
practically incUned reader may read Section IV-B first. 

All of our results assume a uniform distribution on v. 
Nevertheless, this assumption is not critical in our proofs. 
The results can be extended to non-uniform distributions by 
following the same arguments as those used to establish 

'This follows from the converse of [9, Theorem], which says that above 
ao, even the codeword of a single codeword codebook is mislocated with 
probability tending to one. 
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asynchronous capacity per unit cost for non-uniform v [5, 
Theorem 5]. 

A. General Bounds on Asynchronous Capacity 

To communicate reliably, whether synchronously or asyn- 
chronously, the input-output mutual information induced by 
the codebook should at least be equal to the desired commu- 
nication rate. 

When communication is asynchronous, a decoder should, in 
addition, be able to discriminate between hypothesis "noise" 
and hypothesis "message." These hypothesis correspond to the 
situations when the transmitter is idle and when it transmits a 
codeword, respectively. Intuitively, the more these hypotheses 
are statistically far apart — by means of an appropriate code- 
book design — the larger the level of asynchronism which can 
be accommodated for a given communication rate. 

More specifically, a code should serve the dual purpose of 
minimizing the "false-alarm" and "miss" error probabilities. 

Since the decoder doesn't know i^, the decoder may output a 
message before even a message is sent. This is the false-alarm 
event and it contributes to increase the error probability — 
conditioned on a false- alarm the error probability is essentially 
one. However, false-alarms also contribute to increase the rate 
since it is defined with respect to the receiver's decoding delay 
E(r — As an extreme case, by immediately decoding, 
i.e., by setting r = 1, we get an infinite rate and and error 
probability (asymptotically) equal to one. As it turns out, the 
false-alarm probability should be exponentially small to allow 
reliable conmiunication under exponential asynchronism. 

The miss event refers to the scenario where the decoder 
fails to recognize the sent message during transmission, i.e., 
the message output looks like it was generated by noise. This 
event impacts the rate and, to a smaller extent, also the error 
probability. In fact, when the sent message is missed, the 
reaction delay is usually huge, of the order of A. Therefore, 
to guarantee a positive rate under exponential asynchronism 
the miss error probability should also be exponentially small. 

Theorem 2 below provides a lower bound on the asyn- 
chronous capacity. The proof of this theorem is obtained by 
analyzing a coding scheme which performs synchronization 
and information transmission jointly. The codebook is a stan- 
dard i.i.d. random code across time and messages and its 
performance is governed by the Chemoff error exponents for 
discriminating hypothesis "noise" from hypothesis "message." 

Theorem 2 (Lower Bound on Asynchronous Capacity): 
Let a > and let P € J"-^ be some input distribution such 
that at least one of the following inequaUties 

D{V\\{PQ)^)>a 
D{V\\Q^) > a 

holds for all distributions V gT^, i.e., 

min max{£)(y||(PQ)y),£)(y||QO} > a. 

Then, the rate-exponent pair (i? = I{PQ),a) is achievable. 
Thus, maximizing over aU possible input distributions, we have 
the following lower bound on a{R) in Definition 2: 




(PQ) 



Fig. 3. If a is at most the "half-distance" between distributions {PQ)y and 
Q*, then (a, R) with R = I{PQ) is achievable. 



a{R)>a_{R) i?e(0,C] 



(5) 



where 

a_{R) 



"^^^x raB^{D{V\\{PQ)^),D{V\\Q,)}. 

I{PQ)>R} 

(6) 

Theorem 2 provides a simple explicit lower bound on 
capacity. The distribution {PQ)\j corresponds to the channel 
output when the input to the channel is distributed according to 
P. The asynchronism exponent that can be accommodated for 
given P and Q^, can be interpreted as being the "equidistant 
point" between distributions {PQ)y and Q*, as depicted in 
Fig. 3. Maximizing over P such that I{PQ) > R gives 
the largest such exponent that can be achieved for rate R 
communication. 

Note that (6) is much simpler to evaluate than the lower 
bound given by [3, Theorem 2]. Moreover, the former is 
usually a better bound than the latter and it exhibits an 
interesting feature of a{R) in the high rate regime. This feature 
is illustrated in Example 1 to come. 

Theorem 2 extends to the following continuous alphabet 
Gaussian setting: 

Corollary 1 (Asynchronous Gaussian channel): Suppose 
that for a real input x the decoder receives Y — x + Z, where 
Z !N(0, 1). When there is no input to the channel, Y = Z, 
so (5* = 3Nf(0, 1). The input is power constrained so that 
all codewords c"(m) must satisfy ^ Y^^=i Ci("^)^ < p for a 
given constant p> Q. For this channel we have 

a{R) > umx mm^ 
mpX^<p 

(7) 

for i? e (0, C] where P and V in the optimization are 

distributions over the reals. 

If we restrict the outer maximization in (7) to be over Gaussian 
distributions only, it can be shown that the best input has a 

mean /i that is as large as possible, given the rate and power 
constraints. More precisely, /i and R satisfy 

i?= iln(H-p-/.2), 

and the variance of the optimal Gaussian input is p — /i^. The 
intuition for choosing such parameters is that a large mean 
helps the decoder to distinguish the codeword from noise — 
since the latter has a mean equal to zero. What limits the 
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mean is both the power constraint and the variance needed to 
ensure sufficient mutual information to support cormnunication 
at rate R. 

Proof of Corollary 1: The proof uses a standard quantiza- 
tion argument similar to that in [10], and therefore we provide 
only a sketch of the proof. From the given the continuous time 
Gaussian chaimel, we can form a discrete alphabet channel for 
which we can apply Theorem 2. 

More specifically, for a given constant L > 0, the input and 
the output of the channel are discretized within [— i/2,L/2] 
into constant size A contiguous intervals Aj = + A). 
L and A are chosen so that L ^ c» as A ^ 0. To a given 
input X of the Gaussian channel is associated the quantized 
value a; = + A/2 where i denotes the index of the interval 
Aj which contains x. \i x < —L/2 or x > L/2, then x is 
defined as —L/2 or L/2, respectively. The same quantization 
is apphed to the output of the Gaussian channel. 

For each quantized channel we apply Theorem 2, then 
let A — > (hence L oo). One can then verify that 
the achieved bound corresponds to (7), which shows that 
Theorem 2 also holds for the continuous alphabet Gaussian 
setting of Theorem 1 . ■ 

The next result provides an upper bound to the asyn- 
chronous capacity for channels with finite synchronization 
threshold — see Theorem 1: 

Theorem 3 (Upper Bound on Asynchronous Capacity): 
For any channel Q such that ao < oo, and any ii > 0, we 
have that 

a(i?) < maxminjai, 02} = Q!+(-R), (8) 
s 

where 

ai ^<5(/(PiO)-it! + D((PiQ)y||Q.)) (9) 
a2= min uiB:K{D{W\\Q\P2),D{W\\Q^\P2)} (10) 

with 

S^{(Pi,P2,Pi',<5)e (a'^)'x [0,1] : 

I{PiQ) >R, P2 = SPi + {1- S)P[Y (11) 
If ao = 00, then 

a(R) < maxa2 (12) 
P2 

for R e (0,C]. 

The terms ai and 0:2 in (8) reflect the false-alarm and 
miss constraints alluded to above (see discussion before The- 
orem 2). If a > Oil, then with high probability the noise will 
mimic a message before transmission starts. Instead, if a > a2 
then reliable communication at a positive rate is impossible 
since no code can guarantee a sufficiently low probability of 
missing the sent codeword. 

The parameter S in (9) and (11) essentially represents 
the ratio between the reaction delay E(t — and the 
blocklength — which need not coincide. Loosely speaking, for 
a given asynchronism level a smaller S, or, equivalently, a 
smaller E(r — increases the connmunication rate at the 
expense of a higher false-alarm error probabiUty. The intuition 




Fig. 4. A chaimel for which a{R) is discontinuous al R = C. 

















LB[3] \j 
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Fig. 5. Capacity upper and lower bounds on the asynchronous capacity of 
the channel of Fig. 4 with e = 0.1 and * = 0. a- (i?) represents the lower 
bound given by Theorem 2, LB[3] represents the lower bound obtained in [3, 
Theorem 1], and (R) represents the upper bound given by Theorem 3. 



for this is that a decoder that achieves a smaller reaction delay 
sees, on average, "fewer" channel outputs before stopping. 
As a consequence, the noise is more likely to lead such 
a decoder into confusion. A similar tension arises between 
communication rate and the miss error probability. The opti- 
mization over the set § attempts to strike the optimal tradeoff 
between the communication rate, the false-alarm and miss 
error probabilities, as well as the reaction delay as a fraction 
of the codeword length. 

For channels with infinite synchronization threshold. The- 
orem 4 to come estabUshes that the bound given by (12) is 
actually tight. 

The following examples provide some useful insights. 

Example 1: Consider the binary symmetric channel de- 
picted in Fig. 4, which has the property that when no input is 
supplied to the channel, the output distribution is asymmetric. 
For this channel, in Fig. 5 we plot the lower bound on a{R) 
given by (6) (curve Q!_(i?)) and the lower bound given by 
[3, Theorem 1] (the dashed line LB[3]).'^ The a+{R) curve 
correspond to the upper bound on a{R) given by Theorem 3. 
For these plots, the channel parameter is e = 0.1. 

The discontinuity of a(i?) at i? = C (since a{R) is clearly 
equal to zero for R> C) impUes that we do not need to back 
off from the synchronous capacity in order to operate under 

^Due to the complexity of evaluating the lower bound given by [3, 
Theorem 1], the curves labeled LB[3] are actually upper bounds on this lower 
bound. We beUeve these boimds are fairly tight, but in any case we see that 
the resulting upper bounds are below the lower bounds given by (6). 
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Fig. 6. Channel for which q:(_R) is continuous at -R = C. 

exponential asynchronism.' 

Note next that the a_ (R) is better than LB [3] for all rates. 
In fact, empirical evidence suggests that (R) is better than 
LB [3] in general. Additionally, note that a_(i?) and a+{R) 
are not tight. 

Next, we show how another binary symmetric channel has 
some rather different properties. 

Example 2: Consider the binary symmetric channel de- 
picted in Fig. 6, which has the property that when no input is 
provided to the channel the output distribution is symmetric. 
When used synchronously, this channel and that of Example 1 
are completely equivalent, regardless of the crossover proba- 
bility e. Indeed, since the ★ input symbol in Fig. 6 produces 
and 1 equiprobably, this input can be ignored for coding 
purposes and any code for this channel achieves the same 
performance on the channel in Fig. 4. 

However, this equivalence no longer holds when the chan- 
nels are used asynchronously. To see this, we plot the cor- 
responding upper and lower bounds on performance for this 
channel in Fig. 7. Comparing curve a_(i?) in Fig. 5 with 
curve a+{R) in Fig. 7, we see that asynchronous capacity 
for the channel of Fig. 4 is always larger than that of the 
current example. Moreover, since there is no discontinuity in 
exponent at i? = C in our current example, the difference is 
pronounced at i? = C = 0.368 . . .; for the channel of Fig. 4 
we have a{C) w 0.12 > 0. 

The discontinuity of a{R) at R = C observed in Example 1 
is in fact typical, holding in all but one special case. 

Corollary 2 (Discontinuity of a{R) at R = C): We have 
a(C) = if and only if Q^, corresponds to the (unique) 
capacity-achieving output distribution of the synchronous 
channel. 

By Corollary 2, for the binary symmetric channel of Exam- 
ple 1, a{R) is discontinuous at i? = C whenever e ^ 1/2. To 
see this, note that the capacity achieving output distribution 
of the synchronous channel assigns equal weights to * and 1, 
differently than Q*. 

The justification for the discontinuity in Example 1 is 
as follows. Since the capacity-achieving output distribution 
of the synchronous channel (Bernoulli(l/2)) is biased with 

'To have a better sense of what it means to be able to decode under 
exponential asynchronism and, more specifically, nt R = C, consider the 
following numerical example. Consider a codeword length n equal to 150. 
Then a = .12 yields asynchronism level A = e"" f» 6.5 x lO'^ . If the 
codeword is, say, 30 centimeters long, then this means that the decoder can 
reUably sequentially decode the sent message, with minimal delay (were the 
decoder cognizant of it couldn't achieve a smaller decoding delay since we 
operate at the synchronous capacity), within 130 kilometers of mostly noisy 
data! 




Fig. 7. Capacity upper and lower bovmds on the asynchronous capacity of 
the channel of Fig. 6 with e = 0.1. a- {R) represents the lower bound given 
by Theorem 2, LB [3] represents the lower boimd obtained in [3, Theorem 1], 
and a+{R) represents the upper bound given by Theorem 3. 

respect to the noise distribution Q*, hypothesis "message" and 
"noise" can be discriminated with exponentially small error 
probabilities. This, in turn, enables reliable detection of the 
sent message under exponential asynchronism. By contrast, 
for the channel of Example 2, this bias no longer exists 
and a(R = C) = 0. For this channel, to accomodate a 
positive asynchronism exponent we need to backoff from 
the synchronous capacity C so that the codebook output 
distribution can be differentiated from the noise. 

Proof of Corollary 2: From Theorem 2, a strictly positive 
asynchronism exponent can be achieved at i? = C if differs 
from the synchronous capacity-achieving output distribution — 
(6) is strictly positive for i? = C whenever differs from 
the synchronous capacity-achieving output distribution since 
the divergence between two distributions is zero only if they 
are equal. 

Conversely, suppose is equal to the capacity-achieving 
output distribution of the synchronous channel. We show that 
for any {R, a) coding scheme where R = C, a is necessarily 
equal to zero. 

From Theorem 3, 

a{R) < maxai 
s 

where S and ai are given by (11) and (9), respectively. Since 
R = C, I{PiQ) = C, and since = {PiQh, we have 
D{{PiQ)y\\Q^,) = 0. Therefore, ai = for any S, and we 
conclude that a{C) =0. ■ 

In addition to the discontinuity at R = C, a{R) may also 
be discontinuous at rate zero: 

Corollary 3 (Discontinuity of a{R) at R = 0): If 

ao > max£)(Q*||Q(-|a;)), (13) 

then a{R) is discontinuous at rate R = 0. 

Example 3: Channels that satisfy (13) include those for 
which the following two conditions hold: ★ can't produce all 
channel outputs, and if a channel output can be produced by 

then it can also be produced by any other input symbol. For 



9 



* = 




a{R) = a 



Fig. 8. Channel for which a{R) is discontinuous at R 

e€ (0,1). 



0, assuming 



C 



R 



these channels (13) holds trivially; the right-hand side term is 
finite and the left-hand side term is infinite. The simplest such 
channel is the Z-channel depicted in Fig. 8 with e e (0, 1). 

Note that if e = 0, (13) doesn't hold since both the left-hand 
side term and the right-hand side term are infinite. In fact, if 
e = then asynchronism doesn't impact conmiunication; rates 
up to the synchronous capacity can be achieved regardless of 
the level of asynchronism, i.e.. 



a{R) 



oo Re[0,C]. 



To see this, note that by prepending a 1 to each codeword suf- 
fices to guarantee perfect synchronization without impacting 
rate (asymptotically). 

More generally, asynchronous capacity for channels with 
infinite synchronization threshold is estabUshed in Theorem 4 
to come. 

An intuitive justification for the possible discontinuity of 
a{R) at i? = is as follows. Consider a channel where ★ 
cannot produce all channel outputs (such as that depicted in 
Fig. 8). A natural encoding strategy is to start codewords with a 
common preamble whose possible channel outputs differ from 
the set of symbols that can be generated by The remaining 
parts of the codewords are chosen to form, for instance, 
a good code for the synchronous channel. Whenever the 
decoder observes symbols that cannot be produced by noise 
(a clear sign of the preamble's presence), it stops and decodes 
the upcoming symbols. For this strategy, the probability of 
decoding before the message is actually sent is clearly zero. 
Also, the probability of wrong message isolation conditioned 
on correct preamble location can be made negligible by 
taking codewords long enough. Similarly, the probability of 
missing the preamble can be made negligible by using a 
long enough preamble. Thus, the error probability of this 
training-based scheme can be made negligible, regardless of 
the asynchronism level. 

The problem arises when we add a positive rate constraint, 
which translates into a delay constraint. Conditioned on miss- 
ing the preamble, it can be shown that the delay (r — 
is large, in fact of order A. It can be shown that if (13) 
holds, the probability of missing the preamble is larger than 
1/A. Therefore, a positive rate puts a limit on the maximum 
asynchronism level for which reliable communication can be 
guaranteed, and this limit can be smaller than ao- 

We note that it is an open question whether or not a{R) may 
be discontinuous at i? = for channels that do not satisfy (13). 

Theorem 4 provides an exact characterization of capacity for 
the class of channels with infinite synchronization threshold, 
i.e., whose noise distribution Q^, cannot produce all possible 
channel outputs. 



Fig. 9. Typical shape of the capacity of an asynchronous channel Q for 
which ao = oo. 



Theorem 4 (Capacity when ao = oo): If ao = oo, then 



a{R) = a 



(14) 



for R e (0, C], where 



a=max min max{£)(W^|IQ|P), Li(W^|IQ^|P)} . 

Therefore, when ao = oo, a{R) is actually a constant 
that does not depend on the rate, as Fig. 9 depicts. Phrased 
differently, R{a) = C up to a = a. For a > a we have 
R{a) = 0. 

Note that when ao = oo, a{R) can be discontinuous at 
R = since the right-hand side of (14) is upper bounded by 

maxi?(g,||Q(-|x)), 

which can be finite.'" 

We conclude this section with a result of independent 
interest related to synchronous communication, and which 
is obtained as a byproduct of the analysis used to prove 
Theorem 3. This result essentially says that any nontrivial 
fixed length codebook, i.e., that achieves a nontrivial error 
probability, contains a very good large (constant composition) 
sub-codebook, in the sense that its rate is almost the same as 
the original code, but its error probability decays exponentially 
with a suitable decoder. In the following theorem (C„,0„) 
denotes a standard code for a synchronous channel Q, with 
fixed length n codewords and decoding happening at time n. 

Theorem 5: Fix a channel Q G 7^^-^, let q > 0, and let 
e,7 > be such that e + 7 G (0,0 with I e (0, 1). If (e„,(/)„) 
is a code that achieves an error probability e, then there exists 
an no(i, 7, g, |X|, such that for all n > rio there exists 

(e;,o such that" 

1) C C„, 6^ is constant composition; 

2) the maximum error probabiUty is less than e„ where 



e„ = 2(n+l)l^l|yiexp(-ngV(21n2)); 



3) 



in|e'„ 



^ in|e„| 
> — ^ — ^ -7. 



n n 

Theorem 5 is a stronger version of [7, Corollary 1.9, p. 107] 

and its proof amounts to a tightening of some of the arguments 
in the proof of the latter, but otherwise follows it closely. 

'"To see this choose W = Q* in the minimization (14). 
^'We use Uoiq) to denote some threshold index which could be explicitly 
given as a function of q. 
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B. Training-Based Schemes 

Practical solutions to asynchronous communication usu- 
ally separate synchronization from information transmission. 
We investigate a very general class of such "training-based 
schemes" in which codewords are composed of two parts: 
a preamble that is common to all codewords, followed by 
information symbols. The decoder first attempts to detect the 
preamble, then decodes the information symbols. The results 
in this section show that such schemes are suboptimal at least 
in certain communication regimes. This leads to the conclu- 
sion that the separation of synchronization and information 
transmission is in general not optimal. 

We start by defining a general class of training-based 
schemes: 

Definition 3 (Training-Based Scheme): A coding scheme 
{(C„, {Tn,<t>n))}n>i is Said to be training-based if for some 
f] S [0, 1] and all n large enough 

1) there is a common preamble across codewords of size 
?7n; 

2) the decoding time t„ is such that the event 

{Tn = t}, 

conditioned on the r/n observations Y^Zn+i"'^ is in- 
dependent of aU other observations (i.e., y/"" and 

■yA+n-l X 
^t-n+r}n+l^- 

Note that Definition 3 is in fact very general. The only 
restrictions are that the codewords all start with the same 
training sequence, and that the decoder's decision to stop at 
any particular time should be based on the processing of (at 
most) rjn past output symbols corresponding to the length of 
the preamble. 

In the sequel we use a'^{R) to denote the asynchronous 
capacity restricted to training based schemes. 

Theorem 6 (Training-based scheme capacity bounds): 
Capacity restricted to training based schemes satisfies 

al{R) < a^{R) < a^(i?) R e (0, C] (15) 



where 



a^(i?)4mi(l-^ 



aZ{R) = min |m2 ^1 - ,a+(ii)| , 

where the constants mi and m2 are defined as 

TOi = max min max{D(W^| IQIP), Li(W^| IQ^IP)} 
7722 = - \n{mm Q^,{y)) , 

and where a+(i?) is defined in Theorem 3. 

Moreover, a rate R G [0, C] training-based scheme allocates 
at most a fraction 

to the preamble. 

Since m2 < oo if and only ao < oo, the upper-bound in (15) 
impUes: 




Fig. 10. Upper and lower bounds to capacity restricted to training-based 
schemes (a^{R) and (R), respectively) for the binary symmetric chaimel 
depicted in Fig. 4 with c = 0.1. a+iR) and «-(-R) represent the capacity 
general upper and lower bounds given by Theorems 2 and 3. 



Corollary 4 (Asynchronism in the high rate regime): For 
training-based schemes 



a^{R) 



R-^C 







whenever ao < oo. 

In general, a{C) > as we saw in Corollary 2. Hence a 
direct consequence of Corollaries 2 and 4 is that training-based 
schemes are suboptimal in the high rate regime. Specifically, 
we have the following result. 

Corollary 5 ( Suboptimality of training-based schemes): 
There exists a channel-dependent threshold i?* such that for 
all i? > R^, 

a'^{R) < a{R) 

except possibly when Q^, corresponds to the capacity- 
achieving output distribution of the synchronous channel, or 
when the channel is degenerate, i.e., when ao = oo. 
The last claim of Theorem 6 says that the size of the preamble 
decreases (linearly) as the rate increases. This, in tum, implies 
that a^(i?) tends to zero as R approaches C. Hence, in the 
high rate regime most of the symbols should carry information, 
and the decoder should try to detect these symbols as part 
of the decoding process. In other words, synchronization 
and information transmission should be jointly performed; 
transmitted bits should carry information while also helping 
the decoder to locate the sent codeword. 

If we are willing to reduce the rate, are training-based 
schemes still suboptimal? We do not have a definite answer 
to this question, but the following examples provide some 
insights. 

Example 4: Consider the channel depicted in Fig. 4 with 
e = 0.1. In Fig. 10, we plot the upper and lower bounds 
to capacity restricted to training-based schemes given by 
Theorem 6. a_(i?) and a+{R) represent the general lower 
and upper bounds to capacity given by Theorems 2 and 3; see 
Fig. 5. 

By comparing a_(i?) with a^{R) in Fig. 10 we observe 
that for rates above roughly 92% of the synchronous capacity 
C, training-based schemes are suboptimal. 
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For this channel, we observe that a_ (R) is always above 
(R). This feature does not generalize to arbitrary crossover 
probabilities e. Indeed, consider the channel in Fig. 4, but with 
an arbitrary crossover probability e, and let r be an arbitrary 
constant such that < r < 1. From Theorem 6, training- 
based schemes can achieve rate asynchronism pairs {R, a) that 
satisfy 

a > mi (1 - R/C{e)) R € (0, C(e)] . 
For the channel at hand 

mi= Db{1/2\\€), 

hence a tends to infinity as e ^ 0, for any fixed i? G (0, r) — 
note that C(e) ^ 1 as e ^ 0. 

Now, consider the random coding scheme that yields The- 
orem 2. This scheme, which performs synchronization and 
information transmission jointly, achieves for any given rate 
-Re [0, C] asynchronism exponent^^ 



a 



min ma^{D{V\\{PQh),D{V\\Q.)}. 

I{PQ)>R} 



max 
px . 



This expression is upper-bounded by^^ 



max DiQ4iPQ)y] 



(16) 



which is bounded in the limit e ^ as long as i? > 0.'"* 
Therefore the joint synchronization-information transmission 
code yielding Theorem 2 can be outperformed by training- 
based schemes at moderate to low rate, even when the output 
distribution when no input is supplied is asymmetric. This 
shows that the general lower bound given by Theorem 2 is 
loose in general. 

Example 5: For the channel depicted in Fig. 6 with e = 0.1, 
in Fig. 11 we plot the upper and lower bounds on capacity 
restricted to training-based schemes, as given by Theorem 6. 
For this channel it tums out that the training-based scheme 
upper bound TO2 ( 1 — i?/C) (see Theorem 6) is loose and hence 
a^{R) = Q!+(i?) for all rates. By contrast with the example 
of Fig. 10, here the general lower bound a_(i?) is below the 
lower bound for the best training best schemes ia^{R) line). 

Finally, observe that, at all rates, (R) in Fig. 1 1 is below 
a-{R) (and even a^{R)) in Fig. 10. In other words, under 
asymmetric noise, it is possible to accommodate a much larger 
level of asynchronism than under symmetric noise, at all rates. 



V. Analysis 

In this section, we estabUsh the theorems of Section IV. 

'^The analysis of the coding scheme that yields Theorem 2 is actually tight 
in the sense that the coding scheme achieves (6) with equaUty (see proof of 
Theorem 2 and remark p. 14.) 

'^To see this, choose V = Qi, in the minimization. 

'''Let P* = P*{Q) be an input distribution P that maximizes (16) for 
a given channel. Since R < I{P*Q) < H{P*), P* is uniformly bounded 
away from and 1 for all e > 0. This implies that (16) is bounded in the 
Umit e — » 0. 




.05 .1 .15 



Fig. 11. Lower bound (a^(R)) to capacity restricted to training-based 
schemes for the channel of Fig. 6. {R) and a- {R) represent the capacity 
general upper and lower bounds given by Theorems 2 and 3. For this channel 
the training upper bound (a^{R)) coincides with a+(R), and hence is not 
plotted separately. 



A. Proof of Theorem 2 

Let a > and P E satisfy the assumption of the 
theorem, i.e., be such that at least one of the following 
inequalities holds 



D{V\\{PQh)>a 
D{V\\Q^)>a 



(17) 



for all distributions F e T^, and let An = e"(°-^). 

The proof is based on a random coding argument associated 
with the following communication strategy. The codebook 
C = {c"(m)}^^j is randomly generated so that all Ci{m), 
i e {l,2,...,n}, m G {1,2,...,M}, are i.i.d. according 
to P. The sequential decoder operates according to a two-step 
procedure. The first step consists in making an coarse estimate 
of the location of the sent codeword. Specifically, at time t the 
decoder tries to determine whether the last n output symbols 
are generated by noise or by some codeword on the basis of 
their empirical distribution P = Pyt ^ . If Z)(P||(5*) < a, 
P is declared a "noise type," the decoder moves to time t + 1, 
and repeats the procedure, i.e., tests whether P t+i is a noise 

««-n+2 

type. If, instead, D{P\\Qi,) > a, the decoder marks the current 
time as the beginning of the "decoding window," and proceeds 
to the second step of the decoding procedure. 

The second step consists in exactly locating and identify- 
ing the sent codeword. Once the beginning of the decoding 
window has been marked, the decoder makes a decision the 
first time that the previous n symbols are jointly typical with 
one of the codewords. If no such time is found within n 
successive time steps, the decoder stops and declares a random 
message. The typicality decoder operates as follows.'^ Let Pm 
be the probabiUty measure induced by codeword c"(m) and 

"in the literature this decoder is often referred to as the "strong typicality" 
decoder. 



12 



the channel, i.e., 

P„(a,6)^P,n(„)(a)Q(6|a) (a,6)eXxy. (18) 

At time t, the decoder computes the empirical distributions 
Pjn induced by c^lm) and the n output symbols for 
all me {1,2,...,M}. If 

\Pcn{m),yl_^^,{a',b) - Pmia,b)\ < n 

for all (a, 6) G X x y and a unique index m, the decoder 
declares message m as the sent message. Otherwise, it moves 
one step ahead and repeats the second step of the decoding 
procedure on the basis of yl^l^_^_2, i.e., it tests whether yl^l^_^_2 
is typical with a codeword. 

At the end of the asynchronism time window, i.e., at time 
An+n— 1, if is either a noisy type or if it is typical 

with none of the codewords, the decoder declares a message 
at random. 

Throughout the argument we assume that the typicaUty 
parameter fi is a negligible, strictly positive quantity. 

We first show that, on average, a randomly chosen codebook 
combined with the sequential decoding procedure described 
above achieves the rate-exponent pairs {R, a) claimed by the 
theorem. This, as we show at the end of the proof, impUes the 
existence of a nonrandom codebook that, together with the 
above decoding procedure, achieves any pair [R, a) claimed 
by the theorem. 

Let In M/n = I{PQ) - e, e > 0. We first compute the 
average, over messages and codes, expected reaction delay 
and probability of error. These quantities, by symmetry of the 
encoding and decoding procedures, are the same as the average 
over codes expected reaction delay and probability of error 
conditioned on the sending of a particular message. Below, 
expected reaction delay and error probability are computed 
conditioned on the sending of message m = 1. 

Define the following events: 

£i = {D^Py-'+r.-iWQ*) < a , i.e., Py,.+r.-i is a "noise type"} 
£2 = {yj^+"-i is not typical with C"(l)}, 
£3 = {Z)(iV/ HQ*) > for some t < u}. 
For the reaction delay we have 

Ei(t„-i^)+ 

= Ei[(r„-z/)+]l(r„ >iy + 2n)] 

-|-Ei[(r„ - iy)+l{u + n< Tn <iy + 2n)] 

+ Ei[(r„-zv)+l(r„ <iy + n)] 
<iAn+n- 1)Pi(t„ >u + 2n) 

+ 2nFi{u + n<Tn<v + 2n)+n, (19) 

where the subscript 1 in Ei and Pi indicates conditioning on 
the event that message m = 1 is sent. The two probability 
terms on the right-hand side of the second inequality of (19) 
are bounded as follows. 

The term Pi(t„ > + 2n) is upper bounded by the 
probability that the decoding window starts after time v+n—l. 
This, in turn, is upper bounded by the probabiUty of the event 



that, at time v + n — 1, the last n output symbols induce a 
noise type. Therefore, we have 

Pi(T„>z/ + 2n)<Pi(£i) 

{Veyy-. DiV\\Q,)<a} 

{Ve'Pf-,: D{V\\Q,)<a} 

< poly(n)e-"", (20) 

where the second inequality follows from the definition of the 
event £1 and Fact 2; where the third inequality follows from 
(17) (which imphes that if D{V\\Qi,) < a then necessarily 
D{V\\ {PQ)y) > a); and where the fourth inequality follows 
from Fact 1 . 

The probabihty Pi (j^ + n < r„ < 1/ + 2n) is at most the 
probability that the decoder has not stopped by time u + n—1. 
This probability, in tum, is at most the probability that, at time 
i/+n — 1, the last n output symbols either induce a noisy type, 
or are not typical with the sent codeword C"(l) (recall that 
message m = 1 is sent). By union bound we get 

Pi(i/ + n<Tn<y + 2n)<fi{Tn>v + n) 

<Pi(£i)+Pi(£2) 
< poly(n)e-"" + o(l) 
= 0(1) (n^oo), (21) 

where we used the last three computation steps of (20) to 
bound Pi(£i), and where we used [7, Lemma 2.12, p. 34] to 
show that Pi (£2) tends to zero as n tends to infinity. From 
(19), (20), and (21), we deduce that 

lEi(r„-z/)+ <n(l+o(l)) (n ^ 00) 

since An = e"("~^\ by assumption. 

We now compute Pi(£), the average error probability 
conditioned on sending message m = 1. We have 

Pi(£) 

= Pi(£n{T„ < v}) 

-|-Pi(£ C^{v <Tn<v + n-l}) 
+ Pi(£n{T„ >v + n}) 

< Pi(t„ <i')+ Pi(£ n{iy<T„<v + n-l}) 

+ Pi(r„ >u + n) 

< Pi(£3) + Pi(£ n{u<Tn<v + n-l}) 

+ 0(1) (n ^ 00), (22) 

where for the last inequahty we used the definition of £3 and 
upper bounded Pi(t > u+n) using the last three computation 
steps of (21). 
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For Pi(£3), we have 

Pi(£3) = P(Ut<4D(Pv* ||g.) > a}) 



E 



-nD{V\\Q^) 



{V^y^: D{V\\Qi,)>a} 
{VeV^: D(V\\Q^)>a} 

< A„e-"" poly(n) 
= 0(1) (n 00) 



(23) 



where the first inequahty in (23) follows from the union bound 
over time and Fact 2; where the third inequality follows from 
Fact 1; and where the last equality holds since An = e"("~*\ 
by assumption. 
We now show that 

Pi(£n{i^< r„ < !^ + n- 1}) = 0(1) (n^oo), (24) 

which, together with (22) and (23), shows that Pi(£) goes to 
zero as n — > 00. 
We have 

Pi(£ n{iy <T„<v + n-l}) 

= Pi(ur+rHen{r„ = t}n£3}) 
+ Pi(ur+rHen{r„^On£^}) 

< Pi(£3) + Pi(ur+r '{£ n {t„ = n £^}) 

< 0(1) +Fi{{£n{T„ = u + n- 1}) 

+ Fi(ur+r'{£nK = t}n£^}) 

< 0(1) +0(1) 

+ Pi(ui'+r'{£ n {Tn = t}n ei}) (n ^ oo) 

(25) 

where the second inequality follows from (23); where the 
fourth inequality follows from the definition of event £2; and 
where the third inequality follows from the fact that, given 
the correct codeword location, i.e., t„ = + n — 1, the 
typicality decoder guarantees vanishing error probability since 
we assumed that InM/n = I{PQ) — e (see [7, Chapter 2.1]). 

The event {£ fl {t„ = t} (1 £|}, with < t < 1^ + n - 
2, happens when a block of n consecutive symbols, received 
between ly — n. + 1 and + n — 2, is jointly typical with a 
codeword other than the sent codeword C"(l). Consider a 
block F" in this range, and let J e T^'^ be a typical joint 
type, i.e. 

\J{x,y) - P{x)Q{y\x)\ < ^l 

for all {x,y) G X x y — ^recall that > is the typicality 
parameter, which we assume to be a negligible quantity 
throughout the proof. 

For some 1 < fc < n — 1, the first k symbols of block 
are generated by noise, and the remaining n — k symbols are 
generated by the sent codeword, i.e., corresponding to m = 1. 
Thus, F" is independent of any unsent codeword C"(m). The 
probability that C"(m), m 7^ 1, together with F" yields a 



particular type J is upper bounded as follows: 



^ P(F" = t/") 



-niH{Jx)+D{Jx\\P)) 



< ^2 Pi(F" = y")e~"'^('^^^e"''^^"^^i«^ 



(26) 



where H{Jx) denotes the entropy of the left marginal of J, 
H{Jx\'d) - - E"^y(2/) E •^x|y(a;|j/)lnJx|y(a;|j/), 

I/£y a:£X 

and where 7(J) denotes the mutual information induced by 
J. 

The first equality in (26) follows from the independence 
of C"'{;rn) and F", the second equahty follows from [11, 
Theorem 11.1.2, p. 349], and the second inequahty follows 
from [7, Lemma 2.5, p. 31]. 

It follows that the probability that an unsent codeword 
C" (m) together with yields a type J that is typical, i.e., 
close to PQ, is upper bounded as 

for all n large enough, by continuity of the mutual informa- 
tion. 

Note that the set of inequalities (26) holds for any block 

of n consecutive output symbols F" that is independent of 
codeword C"(m).'^ Hence, from the union bound, it follows 
that 



^(ur+r'{£n{T„ = t}n£^}) 

< n ^ Y P(Pcn(„),yn = J) 

m#l {JeTJ I,: V(2;,j/)eXxy, 
\J(x,v)-P{x)Q{y\x)\<ii} 

< nMe-"(^(^'3)-e/2) poly(n) 
<e-"^/2poly(n), 



(27) 



where the second inequality follows from Fact 1, and 
where the third inequality follows from the assumption that 
In M/n = I{PQ) - e. Combining (27) with (25) yields (24). 

So far, we have proved that a random codebook has a decod- 
ing delay averaged over messages that is at most n{l + o(l)) 
(n — * 00), and an error probability averaged over messages 
that vanishes as n — > 00, whenever An = e"("~^\ e > 0. 
This, as we now show, implies the existence of nonrandom 
codebooks achieving the same performance, yielding the de- 
sired result. The expurgation arguments we use are standard 

'^The typicality parameter = ^(e) > is chosen small enough so that 
this inequality holds. 

'^Note that the fact that Y" is partly generated by noise and partly by the 
sent codeword C"(l) is not used to establish (26). 
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and in the same spirit as those given in [11, p. 203-204] or 
[12, p. 151]. 

For a particular codebook e„, let P(£|e„) and E((t„ — 
'^)^|Crt) be the average, over messages, error probability and 
reaction delay, respectively. We have proved that for any e > 0, 

E(E(r„-!/)+|e„)) <n(l+e) 

and 

E(P(£|e„)) < e 

for all n large enough. 
Define events 

Ai = {E(t„ - u)+\en) < n(l + ef}, 

and 

A2 = {P(£|e„) < ek} 

where k is arbitrary. 

From Markov's inequality it follows that'^ 

P(^in^2)>i-7^-7. 

1 + e k 

Letting k be large enough so that the right-hand side of the 
above inequality is positive, we deduce that there exists a 
particular code C„ such that 

E{Tn-iy)+\en)<n{l + ef 

and 

P(£|e„) < ek. 

We now remove from C„ codewords with poor reaction delay 
and error probabiUty. Repeating the argument above with the 
fixed code C„, we see that a positive fraction of the codewords 
of C„ have expected decoding delay at most n{l+e)^ and error 
probability at most efc^. By only keeping this set of codewords, 
we conclude that for any e > and all n large enough, there 
exists a rate R — I{PQ) — e code operating at asynchronism 
level A = e("-')" with maximum error probabiUty less than e. 
■ 

Remark 2: It is possible to somewhat strengthen the con- 
clusion of Theorem 2 in two ways. First, it can be strenthened 
by observing that what we actually proved is that the error 
probability not only vanishes but does so exponentially in n. 
Second, it can be strengthened by showing that the proposed 
random coding scheme achieves (6) with equality. A proof is 
deferred to Appendix A. 

B. Proof of Theorem 3 

We show that any rate i? > coding scheme operates at an 
asynchronism a bounded from above by maxg minjai, a^}, 
where §, a\, and a2 are defined in the theorem's statement. 

We prove Theorem 3 by establishing the following four 
claims. 

The first claim says that, without loss of generaUty, we may 
restrict ourselves to constant composition codes. Specifically, it 

'^Probability here is averaged over randomly generated codewords. 

''Note that the error probability of the typicality decoder given the correct 
message location, i.e., P(£ n {r^ = v + n— 1}}), is exponentially small in 
n [7, Chapter 2]. 



is possible to expurgate an arbitrary code to make it of constant 
composition while impacting (asymptotically) neither the rate 
nor the asynchronism exponent the original code is operating 
at. In more detail, the expurgated codebook is such that all 
codewords have the same type, and also so that all codewords 
have the same type over the first A„ symbols (recall that A„ = 
maxm E(Tn— J^)"*"). The parameter 5 in Theorem 3 corresponds 
to the ratio /^n/n, and Pi and P2 correspond to the empirical 
types over the first A„ symbols and the whole codeword (all 
n symbols), respectively. 

Fix an arbitrarily small constant e > 0. 

Claim 1: Given any coding scheme {(C„, (t„, (/)„))},i>i 
achieving (i?, a) with i? > and a > 0, there exists a second 
coding scheme {(C^, (t„, 0„))}„>i achieving (i?, a) that is 
obtained by expurgation, i.e., C C„, n = 1, 2, . . ., and that 
has constant composition with respect to some distribution 
over the first 

^min{[(l + e)A„J,n} (28) 

symbols, and constant composition with respect to some 
distribution P^ over n symbols. (Hence, if [(1 + e)A„J > n, 
then Pi = P^.) Distributions P^ and P^ satisfy Claims 2-4 
below. 

Distribution P^ plays the same role as the codeword distribu- 
tion for synchronous conmiunication. As such it should induce 
a large enough input-output channel mutual information to 
support rate R communication. 
Claim 2: For all n large enough 

i?<7(P„iQ)(l + e). 

Distribution is specific to asynchronous communication. 
Intuitively, P,^ should induce an output distribution that is suf- 
ficiently different from pure noise so that to allow a decoder to 
distinguish between noise and any particular transmitted mes- 
sage when the asynchronism level corresponds to a. Proper 
message detection means that the decoder should not overreact 
to a sent codeword (i.e., declare a message before even it is 
sent), but also not miss the sent codeword. As an extreme case, 
it is possible to achieve a reaction delay E(t — i^)"*" equal to 
zero by setting r = 1, at the expense of a large probability of 
error In contrast, one clearly minimizes the error probability 
by waiting until the end of the asynchronism window, i.e., by 
setting T = A„ -|- n — 1, at the expense of the rate, which will 
be negligible in this case. 

The ability to properly detect only a single codeword with 
type P^ is captured by condition a < a2 where 0.2 is defined 
in the theorem's statement. This condition is equivalently 
stated as: 

Claim 3: For any W gV^^^ and for all n large enough, at 
least one of the following two inequalities holds 

a<D(W\\Q4P^) + e, 
a<D{W\\Q\P^) + e. 

As it turns out, if the synchronization threshold is finite, P^ 
plays also a role in the decoder's abihty to properly detect the 
transmitted message. This is captured by condition a < ai 
where ai is defined in the theorem's statement. Intuitively, 
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ai relates to the probability that the noise produces a string 
of length n that looks typical with the output of a randomly 
selected codeword. If a > ai, the noise produces many such 
strings with high probability, which implies a large probability 
of error. 

Claim 4: For all n large enough, 



a < 



d{n) 



(J(P^Q)-i? + D((P^g)y||Q,))+e 



provided that Qo < oo. 

Note that, by contrast with the condition in Claim 3, the 
condition in Claim 4 depends also on the communication rate 
since the error yielding to the latter condition depends on the 
number of codewords. 

Before proving the above claims, we show how they imply 
Theorem 3. The first part of the Theorem, i.e., when ao < oo, 
follows from Claims 1-4. To see this, note that the bounds 
ai and a2 in the Theorem correspond to the bounds of 
Claims 3 and 4, respectively, maximized over P^^ and P^. 
The maximization is subjected to the two constraints given by 
Claims 1 and 2: P^ and P2 are the empirical distributions of 
the codewords of over the first Sn symbols (S G [0, 1]), and 
over the entire codeword length, respectively, and condition 
R < 7(Pjj(3)(l + e) must be satisfied. Since c > is arbitrary, 
the result then follows by taking the limit e J, on the above 
derived bound on a. 

Similarly, the second part of Theorem 3, i.e., when ao = 00, 
is a consequence of Claim 3 only. 

We now prove the claims. As above, e > is supposed to 
be an arbitrarily small constant. 

Proofs of Claims 1 and 2: We show that for aU n large 
enough, we have 



l + c - d(n) ' 

where C', is a subset of codewords from C„ that have constant 
composition P\ over the first d{n) symbols, where d{n) is 
defined in (28), and constant composition over n symbols. 
This is done via an expurgation argument in the spirit of [12, 
p. 151] and [11, p. 203-204]. 

We first show the left-hand side inequality of (29). Since 
{(C„, (t„, (/)„))}„>! achieves a rate R, by definition (see 
Definition 1) we have 

ln|e„| 



>R-e/2 



for aU n large enough. Therefore, 

ln|e„| ^R-e/2 



d{n) 



1 + e 



for all n large enough. 

Now, group the codewords of C„ into families such that 
elements of the same family have the same type over the first 
d{n) symbols. Let C'^ be the largest such family and let P^ 
be its type. Within 6", consider the largest subfamily of 
codewords that have constant composition over n symbols, 
and let P^ be its type (hence, all the codewords in have 
conmion type P^ over d{n) symbols and conraion type P^ 
over n symbols). 



By assumption, P > 0, so C„ has a number of codewords 
that is exponential in A„. Due to Fact 1, to estabUsh the left- 
hand side inequality of (29), i.e., to show that achieves 
essentially the same rate as C„, it suffices to show that the 
number of subfamiUes in 6^ is bounded by a polynomial in 
A„. We do this assuming that ao < 00 and that Claim 4 (to 
be proved) holds. 

By assumption, ao < 00, and thus from Theorem 1 we 
have that -D((P(3))y jjQ*) < 00 for any input distribution P. 
Using Claim 4 and the assumption that a > 0, we deduce 
that liminfn^oo d{n)/n > 0, which implies that n cannot 
grow faster than linearly in A„. Therefore, Fact 1 implies that 
the number of subfamilies of 6^ is bounded by a polynomial 
in A„. 

We now prove the right-hand side inequality of (29). Letting 
denote the event of a correct decoding, Markov's inequality 
implies that for every message index m, 

F„({(t„ - 1/)+ < (1 + e)A„} n £^) 



> 1 - 



> 1 - 



1 + e 



-Pm(£) 



1 



Pm(£), 



(30) 



since A„ = maxmE„i(T„ — The right-hand side of (30) 
is strictly greater than zero for n large enough because an 
{R, a) coding scheme achieves a vanishing maximum error 
probability as n ^ 00. This means that is a good code 
for the synchronous channel, i.e., for A = 1. More precisely, 
the codebook formed by truncating each codeword in CJj to 
include only the first d{n) symbols achieves a probability of 
error (asymptotically) bounded away from one with a suitable 
decoding function. This implies that the right-hand side of (29) 
holds for n large enough by [7, Corollary 1.4, p. 104]. ■ 
In establishing the remaining claims of the proof, unless 
otherwise stated, whenever we refer to a codeword it is as- 
sumed to belong to codebook C^. Moreover, for convenience, 
and with only minor abuse of notation, we let M denote the 
number of codewords in C^. 

Proof of Claim 3: We fix G y^l"*- and show that for 
all n large enough, at least one of the two inequalities 

D{W\\Q\P^)>a-e, 
D{W\\Q.\P^)>a-e, 

must hold. To establish this, it may be helpful to interpret W as 
the true channel behavior during the information transmission 
period, i.e., as the conditional distribution induced by the 
transmitted codeword and the corresponding channel output. 
With this interpretation, Z)(11^||(5|P^) represents the large 
deviation exponent of the probability that the underlying 
channel Q behaves as W when codeword distribution is P^, 
and D(iy|l(5*|P^) represents the large deviation exponent of 
the probability that the noise behaves as W when codeword 
distribution is P,^. As it turns out, if both the above inequalities 
are reversed for a certain W, the asynchronism exponent is too 
large. In fact, in this case both the transmitted message and 
pure noise are very likely to produce such a W. This, in turn 
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will confuse the decoder. It will either miss the transmitted 
codeword or stop before even the actual codeword is sent. 

In the sequel, we often use the shorthand notation 7w (m) 
forT^(c"(m)). 

Observe first that if n is such that 



A„/3 



(F,''+"-^eTw(m))=0, 



(31) 



then 



D{W\\Q\P?,) ^ ^, 
by Fact 3. Similarly, observe that if n is such that 

P*(i";+""' e 7w{m)) = 0, (32) 

where P* denotes the probability under pure noise (i.e., the 
Yi's are i.i.d. according to Q^,), then 

D(W^||Q,|p2)=oo. 

Since the above two observations hold regardless of m (be- 
cause all codewords in Q'^ have the same type). Claim 3 holds 
trivially for any value of n for which (31) or (32) is satisfied. 

In the sequel, we thus restrict our attention to values of n 
for which 

Pm{Y,^+"-' e 7w{m)) + (33) 



and 



(34) 



Our approach is to use a change of measure to show 
that if Claim 3 does not hold, then the expected reaction 
delay grows exponentially with n, implying that the rate is 
asymptotically equal to zero. To see this, note that any coding 
scheme that achieves vanishing error probability cannot have 
\mM grow faster than linearly with n, simply because of 
the Umitations imposed by the capacity of the synchronous 
channel. Therefore, if E(t„ — grows exponentially with 
n, the rate goes to zero exponentially with n. And note that for 
E(t„ — v)'^ to grow exponentially, it suffices that (r„ — v)^ 
grows exponentially for at least one message index m, since 
A„ = max„iEm(T„ — by definition. 

To simplify the exposition and avoid heavy notation, in the 
following arguments we disregard discrepancies due to the 
rounding of noninteger quantities. We may, for instance, treat 
A/n as an integer even if A is not a multiple of n. This has 
no consequences on the final results, as these discrepancies 
vanish when we consider code with blocklength n tending to 
infinity. 

We start by lower bounding the reaction delay as^" 



1 

A„ = max — V E„ t(r„ - t)+ 

■m. A I * 



> ^ Pm,t((T„ - i)+ > ^„/3) 

^ t=l 
^ A„/3 

^ 3 Pm,t(r„ >t + A„/3) 



t=i 



^ ^ Pr»,t(r„ > 2A„/3), 



(35) 



t=i 



^"Recall that the subscripts m,t indicate conditioning on the event that 
message m starts being sent at time t. 



where for the first inequality we used Markov's inequality. 
The message index m on the right-hand side of (35) will be 
specified later; for now it may correspond to any message. 

We lower bound each term P^.tC^n > 2j4„/3) in the above 
sum as 

Pm,t(T„ > 2A„/3) 

> Pm,f(r„ > 2A„/3 I y/+"-i € Tiy(m)) 

X P™,t(F/+"-i e 7w{rn)) 

> Pm,t(r„ > 2An/3 I y/+"-^ e 7w{m)) 

X e-"^i poly(n), (36) 

where Di = L'(W^||(3|P^), and where the second inequality 
follows from Fact 3.^^ 

The key step is to apply the change of measure 

Pm,t(r„ > 2A„/3|F/+"-^ e 7w{m)) 

= P*(t„ > 2A„/3|F/+"-i e 7w{m)) . (37) 

To see that (37) holds, first note that for any y" 
P„,t(T„ > 2A„/3|F/+"-i = y") 

= P*(t„ > 2^„/3|y/+"-i = y") 
since distribution Fm,t and P* differ only over channel outputs 

Next, since sequences inside 7w{m.) are permutations of 
each other 

P™,t(F/+"-^ = e 7wim)) = —4-^ 

\Jw{rn)\ 

= P,(y/+"-i = y"|y/+"-^ e 7w{m)), 
we get 

> 2A„/3|y/+"-i e 7w{m)) 

= Yl ^-.*(^" ^ 2A„/3|y/+"-i = y") 

X P„,,(y/+"-i = y"|y/+"-i e 7w{m)) 
= Yl ^*(^» ^ 2^„/3|y/+"-i = y") 

X P*(y/+"-i = j/"|y/+""' € Tv^(m)) 
= P^(r„ > 2A„/3|y/+"-i e T^(m)). 

This proves (37). Substituting (37) into the right-hand side of 
(36) and using (35), we get 

An > e-"^i poly(n) 

A/3 

X Y^.iTn > 2A„/3|y/+"-i e 7w{m)) 

t=i 

> e-"(^i-°=) poly(n) 

A/3 

X X^P*(r„ > 2A„/3,y/+"-i e Tvy(m)), 
t=i 

^'Note that the right-hand side of the first inequality m (36) is well-defined 
because of (33). 
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where D2 = D{W\\Q^,\P^), and where the last inequahty 
follows from Fact 3. By summing only over the indices that 
are multiples of n, we obtain the weaker inequality 

An > e-"(^i-^^) poly(n) 

A/3n 

X ^*(^" ^ 2A„/3, e 7w{m)). (38) 

Using (38), we show that E(t„ — grows exponentially with 
n whenever Di and D2 are both upper bounded by a— e. This, 
as we saw above, implies that the rate is asymptotically equal 
to zero, yielding Claim 3. 

Let A = e"", and let = e/2 . We rewrite the above sum- 
mation over A/3n indices as a sum of Ai = e"'^"~^^~'^^ /3n 
superblocks of A2 = e"(^2+^) indices. We have 

A/3n 

where /, denotes the sth superblock of A2 indices. Applying 
the union bound (in reverse), we see that 

Ai 

E E ^*(^» ^ 2A"/3' ^4"^""' e Tw(m)) 
Ai . 

s=l ^ 

We now show that each term 

P*(r„ > 2An/3, Ujeisiyj:^""' e 7w{m)}) (39) 

in the above summation is large, say greater than 1/2, by 
showing that each of them involves the intersection of two 
large probabihty events. This, together with (38), implies that 

A„ = poly(n)0(e"(«-°i-'')) 

> l](exp(ne/2)) (40) 

since Di < a — e, yielding the desired result.^^ 

Letting £ denote the decoding error event, we have for all 
n large enough 

e > P™(£) 

> Vmi^W > 2A„/3,r„ < 2A„/3) 

X ¥m{i^ > 2A„/3,r„ < 2A„/3) 

> ip„(i/ > 2^„/3)P„(t„ < 2An/3\iy > 2A„/3) 

> ^Pm(r„ < 2Aj3\u > 2AJ3). (41) 

^^Our proof shows that for all indices n for which Di < a — e and 
D2 < a — e, (40) holds. Therefore, if Di < a — e and D2 < o — e for every 
n large enough, the reaction delay grows exponentially with n, and thus the 
rate vanishes. In the case where Di < a — e and D2 < a — c does not 
hold for all n large enough, but still holds for infinitely many values of n, 
the corresponding asymptotic rate is still zero by Definition 1. 



The third inequality follows by noting that the event {v > 
2An/3,Tn < 2An/3} Corresponds to the situation where 
the decoder stops after observing only pure noise. Since a 
codebook consists of at least two codewords, such an event 
causes an error with probabihty at least 1/2 for at least one 
message m. Thus, inequality (41) holds under the assumption 
that m corresponds to such a message.^"* 

Since the event {r„ < 2 An/ 3} depends on the channel 
outputs only up to time 2 An/ 3, we have 

Pm (tVi < 2An/3\,y > 2An/3) = P^(r„ < 2A„/3). (42) 

Combining (42) with (41) we get 

P4t„ > 2A„/3) > 1 - 6e. (43) 

Now, because the i^^^'^"^^, j € Is, are i.i.d. under P^,, 

n(u,e,,{i;e+"-^eTv^(m)}) 

= l-(l-P„(y"eTv^(m)))l^^l. 
From Fact 3 it follows that 

P*(y" e 7w{rn)) > poly(n)exp(-nr'2), 
and by definition = e"(^2+''), so 

P* ( U,e/. {y^:^''-' G 7w{m)}\ = 1 - o(l) (n ^ 

(44) 

Combining (43) and (44), we see that each term (39) involves 
the intersection of large probability events for at least one 
message index m. For such a message index, by choosing e 
sufficiently small, we see that for all sufficiently large n, every 
single term (39), s G {1, 2, . . . , Ai} is bigger than 1/2. ■ 

Finally, to establish the remaining Claim 4, we make use 
of Theorem 5, whose proof is provided in Appendix B. 
This theorem implies that any nontrivial codebook contains a 
(large) set of codewords whose rate is almost the same as the 
original codebook and whose error probability decays faster 
than polynomially, say as e~^, with a suitable decoder. Note 
that we don't use the full implication of Theorem 5. 

Proof of Claim 4: The main idea behind the proof is 
that if Claim 4 does not hold, the noise is likely to produce an 
output that is "typical" with a codeword before the message 
is even sent, which means that any decoder must have large 
error probability. Although the idea is fairly simple, it turns 
out that a suitable definition for "typical" set and its related 
error probability analysis make the proof somewhat lengthy. 

Proceeding formally, consider inequahty (30). This inequal- 
ity says that, with nonzero probabihty, the decoder makes a 
correct decision and stops soon after the beginning of the 
information transmission period. This motivates the definition 
of a new random process, which we call the modified output 
process. With a slight abuse of notation, in the remainder of 
the proof we use Yi, I2, . . . , Ya+^-i to denote the modified 

^^By assumption, see Section III. 

^Regarding the fourth inequality in (41), note that Pm(!/ > 2A„/3) 
should be lower bounded by 1/4 instead of 1/3 had we taken into account 
discrepancies due to rounding of noninteger quantities. As mentioned earlier, 
we disregard these discrepancies as they play no role asymptotically. 
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output process. The modified output process is generated as 
if the sent codeword were truncated at the position v + d(n), 
where d(n) is defined in (28). Hence, this process can be 
thought of as the random process "viewed" by the sequential 
decoder. 

Specifically, the distribution of the modified output process 
is as follows. If 

n> LAn(l + e)J, 

then the l^'s for 

i e {1, . . . , - 1} U {i^ + [A„(l + e)J , . . . , ^„ + n - 1} 
are i.i.d. according to Q*, whereas the block 



+15 



,^i.+ LA„(l+e)J-l 

is distributed according to (5(-|c''^"-'), the output distribution 
given that a randomly selected codeword has been transmitted. 
Note that, in the conditioning, we use c"^'"' instead of c"^'"^ (m) 
to emphasize that the output distribution is averaged over all 
possible messages, i.e., by definition 

1 ^ 

g(^d(n)|^d(n)) = ^ ^(/W |c'^(") (m)). 



m=l 



Instead, if 



n< [A„(l + e)J, 



then the modified output process has the same distribution as 
the original one, i.e., the y^'s for 

ie {!,..., v-l}U{v + n,...,An + n-l} 

are i.i.d. according to Q*, whereas the block 

is distributed according to 

Consider the following augmented decoder that, in addition 
to declaring a message, also outputs the time interval 

[tu- LA„(l + e)J +l,r„- [A„(l + e)J +2,...,t„], 

of size [A„(l + e)J. A simple consequence of the right-hand 
side of (30) being (asymptotically) bounded away from zero 
is that, for n large enough, if the augmented decoder is given 
a modified output process instead of the original one, with 
a strictly positive probability it declares the correct message, 
and the time interval it outputs contains v. 

Now, suppose the decoder is given the modified output 
process and that it is revealed that the (possibly truncated) 
sent codeword was sent in one of the 

{An + n— 1) — {v mod d{n)) 



d{n) 



(45) 



consecutive blocks of duration d{n), as shown in Fig. 12. 
Using this additional knowledge, the decoder can now both 
declare the sent message and output a list of 



rLA„(l + 6)JMn)l 



(46) 



block positions, one of which corresponding to the sent 
message, with a probability strictly away from zero for all 
n large enough. To do this the decoder, at time t„, declares 



Fig. 12. Parsing of the entire received sequence of size A + n — 1 into r-a 
blocks of length d{n), one of which being generated by the sent message, 
and the others being generated by noise. 



the decoded message and declares the blocks that overlap 
with the time indices in 

{r„- [A„(l + e)J +1,T„- [A„(l + e)J +2,...,T„}. 

We now show that the above task that consists of declaring 
the sent message and producing a Ust of £„ blocks of size d{n), 
one of which being the output of the transmitted message, 
can be performed only if a satisfies Claim 4. To that aim we 
consider the performance of the (optimal) maximum likelihood 
decoder that observes output sequences of maximal length 

d{n) ■ Tn- 

Given a sample t/i, j/2, • • • , VA+n-i of the modified output 
process, and its parsing into consecutive blocks of duration 
d{n), the optimal decoder outputs a list of £„ blocks that are 
most Ukely to occur. More precisely, the maximum likelihood 
^„-list decoder operates as follows. For each message m, it 
finds a list of £„ blocks (among all r„ blocks) that 

maximize the ratio 



n)\ d{n) 



(m)) 



Q(y'i(")|*) 



(47) 



and computes the sum of these ratios. The maximum Uke- 
lihood in-iist decoder then outputs the list whose sum is 
maximal, and declares the corresponding message. 

The rest of the proof consists in deriving an upper bound 
on the probability of correct maximum likelihood ^„-list 
decoding, and show that this bound tends to zero if Claim 4 is 
not satisfied. To that aim, we first quantify the probability that 
the noise distribution Qi, outputs a sequence that is typical with 
a codeword, since the performance of the maximum likelihood 
£„-list decoder depends on this probability, as we show below. 

By assumption, (C^, (t„, (/)„)) achieves a probabiUty of 
error ^ as n ^ cx3 at the asynchronism exponent a. This 
implies that can also achieve a nontrivial error probability 
on the synchronous channel (i.e., with A = 1). Specifically, by 
using the same argument as for (30), we deduce that we can 
use C'n on the synchronous channel, force decoding to happen 
at the fixed time 

rf(n) =min{n, [(l + e)A„J}, 

^'To see this, consider a channel output y'^^"^^ that is composed of rn 
consecutive blocks of size dn, where the j'th block is generated by codeword 
c^-in) where all the other blocks are generated by noise. The probability 
of this channel output is 

^(ydn-rn\^^j-^ = Q(2/''("> (j) Ic''^"' ) ]^ Q* (i)) 

where (j), j g {1, 2, . . . , r-n}, denotes the jth bloc of y'*" 
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where A„ corresponds to the reaction delay obtained by 
(6^, (r„, (/)„)) in the asynchronous setting, and guarantee a 
(maximum) probabihty of error e" such that 



1 



1 + e 



with a suitable decoder Since the right-hand side of the above 
inequahty is strictly below one for n large enough, Theorem 5 
with 5 = 1/4 impUes that the code has a large subcode 
6,1, i.e., of almost the same rate with respect to d{n), that, 
together with an appropriate decoding function achieves 
a maximum error probabihty at most equal to 



2(n + 1)1^1-1^1 exp(-V^/(2 In 2)) 



(48) 



for all n large enough. 

We now start a digression on the code (C„, when used 
on channel Q synchronously. The point is to exhibit a set of 
"typical output sequences" that cause the decoder to make 
an error with "large probabihty." We then move back to the 
asynchronous channel Q and show that when Claim 4 does 
not hold, the noise distribution Q^, is likely to produce typical 
output sequences, thereby inducing the maximum likelihood 
f„-hst decoder into error 

Unless stated otherwise, we now consider (C„,0„) when 
used on the synchronous channel. In particular error events 
are defined with respect to this setting. 

The set of typical output sequences is obtained through a 
few steps. We first define the set Am with respect to codeword 
c'^(")(m) e e„ as 

Am ={/^"^ e Tvi/(c''(")(m)) with W e T^l^ : 

P(Tv^(c<^(")(m))|c'^(")(m)) > (49) 

where e„ is defined in (48). 

Note that, by using Fact 3, it can easily be checked that 
Am is nonempty for n large enough (depending on |X| and 
|y|), which we assume throughout the argument. For a fixed m, 
consider the set of sequences in Am that maximize (47). These 
sequences form a set Tq(c''(")(to)), for some Q e T^''^. It 
follows that for every message index m for which c'^^") (m) e 
C„, we have 

ed(n) > IPm(£) 

> P™(£|{y;+'^(")-i e TQ(c'^(")(m))}) 

>P™(£|{yj'+'^(")-i e3„})x 

X V^d(n) 

1 

¥m{{Y::+''^^^-' e ®„}|{y;+'^(")-i e Tg(c'^(")(m))}) 
X (50) 

where for the third inequahty we used the definition of Q; 
where on the right-hand side of the fourth inequahty we 



defined the set 

3 A 
m — 

e JQiS^Hm)) n {Um'^m7Q{S^Hm')))}; 

and where the fifth inequality follows from this definition.^* 
From (50) we get 

< 2^6^. (51) 

Therefore, by defining 23 m as 

the complement of "Bm in TQ(c<^(")(m)), it follows from (51) 
that 

l^m|>(l-2yi^)|TQ(c'^(")(m))|, 

since under P^ all the sequences in (c''^"^ ) (m) are 
equiprobable. 

The set U^,_^^!Bm/ is the sought set of "typical out- 
put sequences" that causes the decoder make an error with 
"high probability" conditioned on the sending of message 
m and conditioned on the channel outputting a sequence in 
•jQ(c''(")(m)). This ends our digression on (C„, 0„). 

We now compute a lower bound on the probability under 
of producing a sequence in U^_23to- Because the sets 
{"Bm} are disjoint, we deduce that 

M 

I U^^2 > (1 - 2V^) J2 \7Q{c'^"\m))\ 

rn=2 

(1 - 2^) _ din)H{Q\P^) 

- {d{n) + i)m-w^ ' 

> 1 d(n)(g(0|Pr)+lnM/d(n)) /^2^ 

- (4n)|3=liy| ^ ^ 

for all n large enough. For the second inequality we used 
[7, Lemma 2.5, p. 31]. For the third inequality we used the 
fact that d{n) < n, M > 2, (1 - 2^?^) > 1/2 for n 
large enough,^^ and that, without loss of generality, we may 
assume that |X| • |y| > 2 since the synchronous capacity C is 
non-zero — as we assume throughout the paper. Hence we get 



> 



(4n)|3^l|y| 



min 

„d(n)(ff(Q|P^) + (lnM)/d(n)) 



X e 



-d(n)(D((P^Q)v||Q.)+//((P^Q)v)) 



^^Note that, given that message m is sent, if the channel produces a 
sequence in 'Bm at its output, the (standard) optimal maximum likelihood 
decoder makes an error with probabihty at least half. Hence the decoding 
rule (t>n also makes an error with probability at least half. 

^^Note that d{n) oo since the coding scheme under consideration 

achieves a strictly positive rate. 
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for all n large enough, where for the second inequality we 
used (52) and [11, Theorem 11.1.2, p. 349]. Letting 

e„ ^ In J(P„iQ) - (InM)Mn) + D{{P^Q)y\\Q,), 

we thus have 



1 



-en-d{n) 



(53) 



for n large enough. 

Using (53), we now prove Claim 4 by contradiction. Specif- 
ically, assuming that 

a > ^^^^ e„ + e/2 for infinitely many indices n, (54) 

we prove that, given message m = 1 is sent, the probability 
of error of the maximum likelihood £„-list decoder does not 
converge to zero. As final step, we prove that the opposite of 
(54) implies Claim 4. 
Define the events 



£i = {yj^+"-i ^ Ai}, 



2 {4n)^m\y\ 

where Ai is defined in (49), and where Z denotes the random 
variable that counts the number of blocks generated by 
that are in U^^2^m- Define also the complement set 

£3 = (£iU£2)^ 

The probability that the maximum UkeUhood ^„-list decoder 
makes a correct decision given that message m = 1 is sent is 
upper bounded as 

3 

Pi(£=) = ^Pi(£^|£i)Pi(£i) 

<Pi(£i) + Pi(£2)+Pi(£'=|£3). (55) 

From the definition of yii, we have 

Pi(£i) = o(l) (n^oo). (56) 

Now for Pi (£2). There are r„ — 1 blocks independently 
generated by (?"« is defined in (45)). Each of these blocks 
has a probability at least equal to the right-hand side of (53) 
to fall within U^^2^m- Hence, using (53) we get 



^iZ > (r„ - 1) 



1 



-end{n) 



> 



1 



since r„ > e""/n. Therefore, 



^an—e„d{n) 



(57) 



^(£2) <Pi(Z< (EiZ)/2) 
4 

< 



EiZ 

< poly(n)e-«"+«"''(") 



(58) 



where the first inequality follows from (57) and the definition 
of £2; where for the second inequality we used Chebyshev's 
inequaUty and the fact that the variance of a binomial is upper 



bounded by its mean; and where for the third inequaUty we 
used (57). 

Finally for Pi(£^|£3). Given £3, the decoder sees at least 



1 



1 



2 (4n)2|X||yr 

time slots whose corresponding ratios (47) are at least as large 
as the one induced by the correct block y^''+<*(")-i Hence, 
given £3, the decoder produces a list of In block positions, one 
of which corresponds to the sent message, with probability at 
most 



2 (4n)2|3C||yr 
poly (n)e-""+'='' 



^a.n—en-d{n) 



(59) 



where the first inequaUty foUows from union bound, and where 
for the equality we used the fact that finite rate implies ^„ = 

poly(n).-^^ 

From (55), (56), (58), and (59), the probability that the 
maximum likelihood ^„-list decoder makes a correct decision. 
Pi (£'^), is arbitrarily small for infinitely many indices n 
whenever (54) holds. Therefore to achieve vanishing error 
probability we must have, for aU n large enough. 



a < 



^ (liP^Q) ~ i\nM)/d{n)+Di{P^QMQ^)) 

+ e/2. (60) 



We now show, via a continuity argument, that the above 
condition implies Claim 4. Recall that Q e J'^l-''-, defined 
just after (49), depends on n and has the property 

P(TQ(c'^(")(m)|c'^(")(m))) > (61) 

Now, from Fact 3 we also have the upper bound 

P(TQ(c''(")(m)|c''(")(m))) < e-''(")-f'('3ll«l^n). (62) 

Since ^€d{n) = fJ(e~v'^), from (61) and (62) we get 

D{Q\\Q\P^) ^0 as n-^00, 

and therefore 

\\P^Q-P^Q\\^0 as n^oo, 

where || • || denotes the Li norm. Hence, by continuity of the 
divergence, condition (60) gives, for all n large enough. 



d{n) 



a < ^ (/(P^iQ) - {\nM)/d{n) + D{{PMm) 



+ e 

which yields Claim 4. 



(63) 



28This follows from the definition of rate R = ln M/E(t - the fact 
that In M/n < C for reUable communication, and the definition of £„ (46). 
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C. Proof of Corollary 3 

By assumption ao is nonzero since divergence is always 
non-negative. This implies that the synchronous capacity is 
nonzero by the last claim of Theorem 1 . This, in turn, implies 
that (i?, a) is achievable for some sufficiently small R > 
and a > by [3, Corollary 1]. 

Using Theorem 3, 



a < a{R) < maxa2 



(64) 



where a2 is given by expression (10). In this expression, by 
letting W = in the minimization, we deduce that a2 < 

D{Q^\\Q\P2), and therefore 

maxa2 < maxD((5^||(5|P2) 
s s 



= max D(Qi, 

P2 



\P2) 



max^Q*(y) In 



QAy) 



Qiy\x) 



maxD{Q4Q{-\x)), 



and from (64) we get 



a<ma,xD{Q4Q{-\x)). 

Since, by assumption, 

ao > maxi:)(Q*||Q(-|a;)), 

and since ao = a{R = 0) by Theorem 1, it follows that a{R) 
is discontinuous at i? = 0. ■ 

D. Proof of Theorem 4 

We first exhibit a coding scheme that achieves any {R, a) 
with i? < C and 

a < max ^min ^ max{D{W\\Q\P), D{W\\Q.\P)}. 

All codewords start with a common preamble that is 
composed of (ln(n))^ repetitions of a symbol x such that 
D{Q{-\x)\\Qi,) = oc' (such a symbol exists since ao = oo). 
The next (In(n))'' symbols of each codeword are drawn from 
a code that achieves a rate equal to i? — e on the synchronous 
channel. Finally, all the codewords end with a common large 
suffix s' of size I = n — (ln(n))^ — (In(n))'' that has an 
empirical type P such that, for all W ^ J^l-^, at least one of 
the following two inequalities holds: 

D{W\\Q\P) > a 
D{W\\Q^\P) > a. 

The receiver runs two sequential decoders in parallel, and 
makes a decision whenever one of the two decoder declares 
a message. If the two decoders declare different messages at 
the same time, the receiver declares one of the messages at 
random. 

The first decoder tries to identify the sent message by first 
locating the preamble. At time t it checks if the channel output 
yt can be generated by x but cannot be generated by noise, 
i.e., if 

Q{yt\x) > and Q{yt\*) = 0. (65) 



If condition (65) does not hold, the decoder moves one-step 
ahead and checks condition (65) at time t+l.lf condition (65) 
does hold, the decoder marks the current time as the beginning 
of the "decoding window" and proceeds to the second step. 
The second step consists in exactly locating and identifying the 
sent codeword. Once the beginning of the decoding window 
has been marked, the decoder makes a decision the first time 
it observes (Inn)'^ symbols that are typical with one of the 
codewords. If no such time is found within (ln(n))^ + (ln(n))^ 
time steps from the time the decoding window has been 
marked, the decoder declares a random message. 

The purpose of the second decoder is to control the average 
reaction delay by stopping the decoding process in the rare 
event when the first decoder misses the codeword. Specifically, 
the second "decoder" is only a stopping rule based on the 
suffix sK At each time t the second decoder checks whether 
D{PYt ^^J|Q|P) < a. If so, the decoder stops and declares 
a random message. If not, the decoder moves one step ahead. 

The arguments for proving that the coding scheme described 
above achieves {R, a) provided 

a < maxminmax{I?(VF||Q|P),L»(VF||g*|P)} (66) 

p w 

closely parallel those used to prove Theorem 2, and are 
therefore omitted.^' 

The converse is the second part of Theorem 3. ■ 

E. Proof of Theorem 6 

1) Lower bound: To establish the lower bound in Theo- 
rem 6, we exhibit a training based scheme with preamble size 
ijn with 

r? = (1 - R/C), (67) 
and that achieves any rate asynchronism pair {R, a) such that 

IV 

. c. 

where 



a < mi I 1 



R e (0, C] 



(68) 



mi = max min m&x{D{W\\Q\P),D{W\\Q^\P)}. 

Fix R E (0, C] and let a satisfy (68). Each codeword starts 
with a common preamble of size ijn where rj is given by (67) 
and whose empirical distribution is equal to^*^ 



arg max ( ^min ^ max{D{W\\Q\P), D{W\\Q.\P)}). 

The remaining (1 — ri)n symbols of each codeword are i.i.d. 
generated according to a distribution P that almost achieves 
capacity of the synchronous channel, i.e., such that I{PQ) = 
C — e for some small e > 0. 

Note that by (68) and (67), a is such that for any 
at least one of the following two inequalities holds: 



D{W\\Q\Pj>)>ah 
D{W\\Q,\Pj,)>a/r). 



(69) 



^'in particular, note that the first decoder never stops before time v. 

^"Pp need not be a valid type for finite values of n, but this small 
discrepancy plays no role asymptotically since Pp can be approximated 
arbitrarily well with types of order sufficiently large. 
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The preamble detection rule is to stop the first time when 
last rjn output symbols i^/_^„_|_i induce an empirical condi- 
tional probability Py' ixi" such that 

i?(%^^^^|,.„||0|Pp) < D(%^^^j,.„||0,|Pp) (70) 

where a;^" is the preamble. 

When the preamble is located, the decoder makes a decision 
on the basis of the upcoming (1 — 77) n output symbols 
using maximum UkeUhood decoding. If no preamble has been 
located by time ^„ + n — 1, the decoder declares a message 
at random. 

We compute the reaction delay and the error probability. 
For notational convenience, instead of the decoding time, we 
consider the time t„ that the decoder detects the preamble, 
i.e., the first time t such that (70) holds. The actual decoding 
time occurs (1 — 77)71 time instants after the preamble has been 
detected, i.e., at time t„ + (1 — ri)n. 

For the reaction delay we have 

E(r„-i/)+ =Ei(r„-:.)+ 

= Ei[(r„-i/)+ll(T„ >iy + r]n)] 

+ Ei[(r„-i/)+l(r„ <i^ + r]n-l)] 

< {An + n - l)¥i{Tn > 1^ + W) + vn (71) 

where, as usual, the subscript 1 in Ei and Pi indicates 
conditioning on the event that message m = 1 is sent. A 
similar computation as in (20) yields 

Pi(t„ >iy + 7]n) 

< Pi(D(P^_.+.„-i|^,„||g|Pp) > a/r,) 



< 



E 



-nnDiW\\Q\Pp) 



WevV^: D{W\\Q\Pp)>a/ri 

< poly(n)e-"" . 



(72) 



The first inequality follows from the fact that event {t„ > 
v + n} is included into event 

which, in turn, is included into event 

{D(\.+.n-i|^,„||g|Pp)>aM 

because of (69). The second inequality follows from Fact 2. 
Hence, from (71) and (72) 

E(r„-7/)+ <r7n + o(l) (73) 

whenever An = e'^^°'~^\ e > 0. Since the actual decoding 
time occurs (1 — r?)n time instants after r„, where r] = {1 — 
R/C), and that the code used to transmit information achieves 
the capacity of the synchronous channel, the above strategy 
operates at rate R. 

To show that the above strategy achieves vanishing error 
probability, one uses arguments similar to those used to prove 
Theorem 2 (see from paragraph after (21) onwards), so the 
proof is omitted. There is one little caveat in the analysis that 
concerns the event when the preamble is located somewhat 
earlier than its actual timing, i.e., when the decoder locates the 
preamble over a time period [t — rin+ 1, . . . ,t\ with u <t < 



u+rin—2. One way to make the probability of this event vanish 
as n — > 00, is to have the preamble have a "sufficiently large" 
Hamming distance with any of its shifts. To guarantee this, 
one just needs to modify the original preamble in a few (say, 
logn) positions. This modifies the preamble type negUgibly. 
For a detailed discussion on how to make this modification, 
we refer the reader to [9], where the problem is discussed in 
the context of sequential frame synchronization. 

Each instance of the above random coding strategy satisfies 
the conditions of Definition 3; there is a common preamble of 
size rjn and the decoder decides to stop at any particular time t 
based on Y^Zn+l^■ We now show that there exists a particular 
instance yielding the desired rate and error probability. 

First note that the above rate analysis only depends on the 
preamble, and not on the codebook that follows the preamble. 
Hence, because the error probability, averaged over codebooks 
and messages, vanishes, we deduce that there exists at least 
one codebook that achieves rate R and whose average over 
messages error probability tends to zero. 

From this code, we remove codewords with poor error 
probability, say whose error probabilities are at least twice 
the average error probability. The resulting expurgated code 
has a rate that tends to R and a vanishing maximum error 
probability. 

2) Upper hound: To establish the upper bound it suffices 
to show that for training based schemes (P, a) with P > 
must satisfy 



a < m2 I 1 — 



(74) 



The upper bound in Theorem 6 then follows from (74) and 
the general upper bound derived in Theorem 3. 

The upper bound (74) follows from the following lemma: 
Lemma 1: A rate P > coding scheme whose decoder 
operates according to a sliding window stopping rule with 
window size rjn cannot achieve an asynchronism exponent 
larger than 7/7712. 

Lemma 1 says that any coding scheme with a limited memory 
stopping rule capable of processing only rju symbols at a time 
achieves an asynchronism exponent at most 0(77), unless P = 
or if the channel is degenerate, i.e., ao = TO2 = 00, in which 
case Lemma 1 is trivial and we have the asynchronous capacity 
expression given by Theorem 4. 

To deduce (74) from Lemma 1, consider a training-based 
scheme which achieves a delay A with a non-trivial error 
probability (i.e., bounded away from 0). Because the preamble 
conveys no information, the rate is at most 

,min{A, n] — rjn 



C- 



< c(i - v) 



by the channel coding theorem for a synchronous channel. 
Hence, for a rate P > training-based scheme the training 
fraction r] is upper bounded as 

P 
C' 

This implies (74) by Lemma 1. ■ 
Proof of Lemma 1: The lemma holds trivially if m2 = 
00. We thus assume that m2 < 00. Consider a training-based 



?] < 1 
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scheme {(C„, (t„, (/'n))}n>i in the sense of Definition 3. For 
notational convenience, we consider r„ to be the time when 
the decoder detects the preamble. The actual decoding time (in 
the sense of Definition 3 part 2) occurs (1 — ri)n times instants 
after the preamble has been detected, i.e., at time t„ + (1 — 77)71. 
This allows us to write t„ as 



T„ = mi{t > 1 : St = l}, 



where 



StiYl 



l<t<An + n-l, 



'■t-rin+l) 

referred to as the "stopping rule at time t" is a binary random 
variable such that {St = 1} represents the set of output 
sequences yl_^^_^-^ which make r„ stop at time t, assuming 
that r„ hasn't stopped before time t. 
Now, every sequence j/''" e satisfies 

Therefore, any deterministic stopping rule stops at any par- 
ticular time either with probability zero or with probability 
at least e""*^''", i.e., for all t, either the stopping rule St 
satisfies f'{St = 1) > ^-m^nn ^j. j^. trivial in the sense 
that f{St = 1) = 0. For now, we assume that the stopping 
rule is deterministic; the randomized case follows easily as we 
describe at the end of the proof. 

Let § denote the subset of indices t e {1, 2, . . . , j4„/4} 
such that St is non-trivial, and let Sfe denote the subset of 
indices in S that are congruent to k mod r^n, i.e., 

Sfe = {i : t e S, i = i • Tjn -h fc, j = 0, 1, . . .} . 

Note that for each fc, the set of stopping rules St, f € Sfe are 
independent since St depends only on y/_^„^i. 

By repeating the same argument as in (41)-(42), for any 
e > 0, for all n large enough and any message index m the 
error probabihty Pto(£) satisfies 

e>Pm(£) 

> Jp*(r„ < An/2). 

Since e > is arbitrary, we deduce 

P*(t„ > Anl2) > 1/2 



(75) 



(76) 



i.e., a coding scheme achieves a vanishing error probability 
only if the probability of stopping after time An/ 2 is at least 
0.5 when the channel input is all ^'s. Thus, assuming that our 
coding scheme achieves vanishing error probability, we have 

|S| < r/ne'"^''" . 

To see this, note that if |S| > ryne™''", then there exists a 
value k* such that |Sfe. | > e™^''", and hence 

P*(t„ > An/2) < V.{St = 0, t e §) 
<P*(5t = 0, te §fc.) 

= {1 — g-"»2r)n^|Sfc* I 



<(1 



Since the above last term tends to 1/e < 1/2 for n large 
enough, P*(r„ > An/ 2) < 1/2 for n large enough, which is in 



conflict with the assumption that the coding scheme achieves 
vanishing error probabihty. 

The fact that |§| < rjne"^^^"' implies, as we shaU prove 
later, that 

P(t„ > A„/2|z/ < An/4) > 2 ( 1 - ^^^^ ) • (77) 

Hence, 

E(r„ - p)+ > E((r„ - i/)+|r„ > A„/2, < A„/4) 
X P(r„ > An/2, V < An/ A) 

An 



> > An/2\P < An/A) 



> 



An 
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SrjTi e 



2 m2i7n 



An 



(78) 



where for the second inequaUty we used the fact that v is 
uniformly distributed, and where the third inequality holds by 
(77). Letting A„ = e"", from (78) we deduce that if a > 
mr], then ]E(t„ — iy)~^ grows exponentially with n, implying 
that the rate is asymptotically zero.^^ Hence a sliding window 
stopping rule which operates on a window of size rjn cannot 
accommodate a positive rate while achieving an asynchronism 
exponent larger than 77m. This establishes the desired result. 

We now show (77). Let 'N be the subset of indices in 
{1,2,..., An/ 4:} with the following property. For any t E 'N, 
the 2n indices {t,t + 1, . . . ,t + 2n — 1} do not belong to S, 
i.e., all 2n of the associated stopping rules are trivial. Then 
we have 

P(t„ > An/2\iy < An/4) > P(r„ > An/2\u e K) 

X P(z/ € J<\iy < An/4) 



= P(r„ > Ar,/2\u e K) 



An/4 



(79) 



since u is uniformly distributed. Using that |§| < Tjne'"^''", 

m > {An/4-2r^n\^^'n, 

hence from (79) 

P(r„ > An/2\v < An/4) 

/ 8777? 2 p'"^ 'J" \ 
> P(r„ > An/2\,y e 1 - . (80) 



Now, when ly E 'N, all stopping times that could potentially 
depend on the transmitted codeword symbols are actually 
trivial, so the event {t„ > An/2} is independent of the 
symbols sent at times u,!/ + 1, . . . ,1/ + N — 1. Therefore, 



P(t„ > An/2\u e K) = P*(r„ > An/2). 



(81) 



Combining (81) with (80) gives the desired claim (77). 

Finally, to see that randomized stopping rules also can- 
not achieve asynchronism exponents larger than 77m, note 
that a randomized stopping rule can be viewed as simply a 
probability distribution over deterministic stopping rules. The 
previous analysis shows that for any deterministic stopping 

"Any coding scheme that achieves vanishing error probability cannot have 
InM grow faster than linearly with n, because of the limitation imposed 
by the capacity of the synchronous channel. Hence, if E(t„ — u)+ grows 
exponentially with n, the rate goes to zero exponentially with n. 
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rule, and any asynchronism exponent larger than 77m, either 
the probability of error is large (e.g., at least 1/8), or the 
expected delay is exponential in n. Therefore, the same holds 
for randomized stopping rules. ■ 

F. Comments on Error Criteria 

We end this section by commenting on maximum versus 
average rate/error probability criteria. The results in this paper 
consider the rate defined with respect to maximum (over mes- 
sages) reaction delay and consider maximum (over messages) 
error probability. Hence all the achievability results also hold 
when delay and error probability are averaged over messages . 

To see that the converse results in this paper also hold for 
the average case, we use the following standard expurgation ar- 
gument. Assume {(C„, (r„, (j>n))} is an (_R, a) coding scheme 
where the error probability and the delay of {Tn,(t>n)) are 
defined as 

M 



m=l 



and 



1 ^ 



respectively. By definition of an (i?, a) coding scheme, this 
means that given some arbitrarily small e > 0, and for all n 
large enough, 

En < e 



and 



InM 



> R-e. 



Hence, for n large enough and any 5 > 1, one can find a 
(nonzero) constant fraction of codewords C„' C C„ (C„' is the 
"expurgated" ensemble) that satisfies the following property: 
the rate defined with respect to maximum (over C„') delay is at 
least (i? — e) /(5 and the maximum error probability is less than 
rye, where rj — -qlS) > Q. One then applies the converse results 
to the expurgated ensemble to derive bounds on {R/5, a), and 
thus on (i?, a), since 5 > 1 can be chosen arbitrarily. 

VI. Concluding Remarks 

We analyzed a model for asynchronous communication 
which captures the situation when information is emitted 
infrequently. General upper and lower bounds on capacity 
were derived, which coincide in certain cases. The forms 
of these bounds are similar and have two parts: a mutual 
information part and a divergence part. The mutual information 
part is reminiscent of synchronous communication: to achieve 
a certain rate, there must be, on average, enough mutual 
information between the time information is sent and the time 
it is decoded. The divergence part is novel, and comes from 
asynchronism. Asynchronism introduces two additional error 
events that must be overcome by the decoder. The first event 
happens when the noise produces a channel output that looks 
as if it was generated by a codeword. The larger the level 
of asynchronism, the more likely this event becomes. The 
second event happens when the channel behaves atypically. 



which results in the decoder missing the codeword. When this 
event happens, the rate penalty is huge, on the order of the 
asynchronism level. As such, the second event contributes to 
increased average reaction delay, or equivalently, lowers the 
rate. The divergence part in our upper and lower bounds on 
capacity strikes a balance between these two events. 

An important conclusion of our analysis is that, in general, 
training-based schemes are not optimal in the high rate, high 
asynchronism regime. In this regime, training-based architec- 
tures are unreliable, whereas it is still possible to achieve 
an arbitrarily low probability of error using strategies that 
combine synchronization with information transmission. 

Finally, we note that further analysis is possible when we 
restrict attention to a simpler slotted communication model 
in which the possible transmission slots are nonoverlapping 
and contiguous. In particular, for this more constrained model 
[13] develops a variety of results, among which is that except 
in somewhat pathological cases, training-based schemes are 
strictly suboptimal at all rates below the synchronous capacity. 
Additionally, the performance gap is quantified for the special 
cases of the binary symmetric and additive white Gaussian 
noise channels, where it is seen to be significant in the high 
rate regime but vanish in the Umit of low rates. Whether the 
characteristics observed for the slotted model are also shared 
by unslotted models remains to be determined, and is a natural 
direction for future research. 
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Appendix A 
Proof of Remark 2 (p. 14) 

To show that the random coding scheme proposed in the 
proof of Theorem 2 achieves (6) with equality, we show that 

" - P:i^^Q)>R^y^ max{Z)(y||(PQ)y),Z)(T/||Q.)}- 

(82) 

Recall that, by symmetry of the encoding and decoding 
procedures, the average reaction delay is the same for any 
message. Hence 

A„ = El (t„ -!/)+, 

where Ei denotes expectation under the proability measure 
Pi, the channel output distribution when message 1 is sent, 
averaged over time and codebooks. 
Suppose for the moment that 



Ei(r„-i/)+>n(l-o(l)) 



n — > 00 . 



(83) 
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It then follows from Fano's inequality that the input distribu- 
tion P must satisfy I{PQ) > R. Hence, to establish (82) we 
will show that at least one of the following inequalities 

D{V\\{PQh)>a 

D{V\\Q^) > a (84) 

holds for any F e T^. The arguments are similar to those 
used to establish Claim 3 of Theorem 3. Below we provide 
the key steps. 

We proceed by contradiction and show that if both the 

inequalities in (84) are reversed, then the asymptotic rate is 
zero. To that aim we provide a lower bound on Ei(t„ — vY^. 

Let denote the time of the beginning of the decoding 
window, i.e., the first time when the previous n output symbols 
have empirical distribution P such that £)(P||(5*) > a. By 
definition, Tn > t!^, so 

]Ei(r„-zv)+>Ei(r;-j.) + 

> oE^M«>2A„/3), (85) 

t=i 

where the second inequality follows from Markov's inequality, 
and where Pi t denotes the probability measure at the output 
of the channel conditioned on the event that message 1 starts 
being sent at time t, and averaged over codebooks. Note that, 
because t'^ is not a function of the codebook, there is no 
averaging on the stopping times. 

Fix y e Ty. We lower bound each term Pi,t(T4 > 2A„/3) 
in the above sum as 

Pi.tK > 2A„/3) 

> Pi,t(r; > 2A„/3|y/+"-i e Ty)Pi,t(y/+"-i e Ty) 

> Pi,t(T„ > 2A„/3|y/+"-i e Ty)e-"^^ poly(n), (86) 

where Di = D{V\\{PQ)^^), and where the second inequality 
follows from Fact 2. 

The key change of measure step (37) results now in the 
equality 

Pi,t(r; > 2A„/3|F/+"-i G 7v) 

= P.(t; > 2A„/3|y/+"-i e Ty), (87) 

which can easily be checked by noticing that the probability 
of any sequence J/J"*"""^ in Ty is the same under Pi^t. 
Substituting (87) into the right-hand side of (86), and using 

(85) and Fact 2, we get 

Ei(t„ - > e-"(^i-^^) poly(n) 

A/3 

X ^P.(r„ > 2^„/3,y/+"-i e 7v), (88) 

t=i 

where D2 = D{V\\Q^,). The rest of the proof consists in 
showing that if the two inequalities in (84) are reversed, then 
the right-hand side of the above inequality grows exponentially 
with n, which results in an asymptotic rate equal to zero. 

'^For different codebook realizations, stopping rule is the same, by 
contrast with Tn which depends on the codebook via the joint typicality 
criterion of the second phase. 



The arguments closely parallel the ones that prove Claim 3 of 
Theorem 3 (see from (38) onwards), and hence are omitted. 

To conclude the proof we show (83). Using the alternate 
form of expectation for non-negative random variables MX = 
Y:k>o > we have 

Ei(r„ -iy)+>Y^ Pi(r„ > + k) 

i=l 

>^(l-Pl(T„<l/ + i)) 
i=l 

>g{n){l-¥i{Tn<u + g{n))), 

where we defined 

and where the last inequality follows from the fact that 
Pi('^n < + «) is a non-decreasing function of i. Since 
g{n) = n(l — o(l)), to establish (83) it suffices to show that 

Pi(r„ < + = 0(1) (n^oo). (89) 

Since 

Pi(r„ < i^) = 0(1) (n^oo), 

as follows from computation steps in (22) and (23), to establish 
(89) it suffices to show that 

< Tn < u + g{n)) = o{l) (n^oo). (90) 

For i G {0, 1, ... , g{n)} we have 

Pl(rn = :^ + ^) 

< Pi (ll4"(i),i-:+'_„+,^QII < M • ix| • m) 

= i:^^{Pc^ii),Y::u,, = j) (91) 

J 

where the above suimnation is over all typical joint types, i.e., 
allJ e T^'^ such that 

I^C"(i),y:+'_„+,(«' ^) - ■^("' ^ 1^ (92) 

for all (a, 6) G X x y. 

We upper bound each term in this summation. First observe 
that event 

{^c"(i),Y-;+i„+, = J} . 

for i G {0,1, ... ,g{n)}, involves random vector F^T+i-n+i 
which is partly generated by noise and partly generated by 
the transmitted codeword corresponding to message 1. In the 
following computation k refers to first symbols of Y^J^l_^_^-^^ 
which are generated by noise, i.e., by definition k = n— (i+1). 
Note that since < i < g{n), we have 

[n^/^l - 1 < fc < n - 1 . 
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We have 



The maximum on the right-hand side of (96) is equal to 



E 



kJi + {n—k)J2=nJ 



^ P(x'^)Q.(/) 



X I ^ P(x"-^^/"-'=) I , (93) 

where we used the following shorthand notations for proba- 
bihties 



k 

Q*iy'') = llQ4yj) 



F{x-->^,y--'')^l[P{xj)Q{yj\xj). 



Further, using Fact 2 



^ P(x'=)P.(/) 



= E ^(^') E 

x'°:P^k='Jl,X v'°-Pyk=Jl,y 

< g-fe(D(Ji,x||P)+D(Ji,v||Q.)) 

< g-fe£'(''i,5)IIQ*) (94) 

where Ji,x and Ji^y denote the left and right marginals of 
J, respectively, and where the second inequality follows by 
non-negativity of divergence. 
A similar calculation yields 



E 



< ^-{n-k)D{J2\\PQ) 

From (93), (94), (95) and Fact 1 we get 

Pi(4"(i).y''+i„^,=-^) 
< poly(n) 



(95) 



X 



max 



X,!( 



exp 



kJi+{n—k)J2—nJ 
k:\n^^'^]-l<k<n-l 



k{D{J,,y\\Q.)) 



{n-k)D{J2\\PQ) . (96) 



max 



exp 



kD{J^\\Q^ 



J^^^l-k 

kJi+{n—k)J2=nJy 
k: -l<fe<n-l 



-{n-k)D{J2\\{PQh) 



(97) 



We upper bound the argument of the above exponential via 
the log- sum inequality to get 



- kD{Ji\\Q^) - (n - k)D{J2\\{PQh) 
< -nD{j^\\5Q^ + {l-5){PQ)^), 



(98) 



where 5 = k/n. Using (98), we upper-bound expression (97) 
by 



max 

5:n-i/4-„-i<5<l 



exp 



nD{j^\\5Q, + {l-6){PQ)^] 



< max exp [— nr2((5^)] 

5:n-i/4-„-i<5<l ^ 



< exp 



(99) 



where for the first inequality we used Pinsker's inequality [7, 
Problem 17 p. 58] 



D{Pi\\P2)> 



21n2 



\Pl-P2\?, 



and assume that n is small enough and n is large enough for 
this inequality to be valid. Such ji and n exist whenever the 
distributions and {PQ)^ are different. 
It then follows from (96) that 



-i(Pc'.(i).y-+'_ = < exp -n{n^/^) 



hence, from (91) and Fact 1 we get 

Pi(r„ = z/ i) < exp [-^(n^/^) 



for i e {0, 1, . . .,g{n)}. Finally a union bound over times 
yields the desired result (89) since g{n) = 0{n). 

Appendix B 
Proof of Theorem 5 

The desired Theorem is a stronger version of [7, Corol- 
lary 1.9, p. 107], and its proof closely follows the proof of the 
latter. 

Before proceeding, we recall the definitions of ry-image and 
^-neighborhood of a set of sequences. 

Definition 4 (rj-image, [7]Definition 2.1.2 p. 101): A set 
3 C y" is an ry-image of a set ^1 C X" if (5(3 |a;) > 77 for 
all a; e yi. The minimum cardinaUty of //-images of A is 
denoted gqiA, rf). 

Definition 5 (l-neighborhood, [7] p. 86): The l- 
neighborhood of a set S C is the set 

r''B^{y^GT:d„{{y^},'B)<l} 
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where d// ({?/"}, B) denotes the Hamming distance between 
y" and 3, i.e., 

d^^({y"},B)= mindH(y",r)- 

As other notation, for a given conditional probability Q{y\x), 
{x, y) e X X y, and x" e X", we define the set 

TpQ](a;") = {y"er : 

|4n,^„(a,6) -4"(a)<3(&|a)| > 9, V(a,6) e X x y} 

for a constant g > 0. To establish Theorem 5, we make use of 
the following three lemmas. Since we restrict attention to block 
coding schemes, i.e., coding scheme whose decoding happens 
at the fixed time n, we denote them simply by (C„, 0„) instead 

of (e„, (7n,0n))- 

In the following, e„ is always given by 

e„ === (n + 1)1^1-1^1 exp(-ngV(2 In 2)). 

Lemma 2: Given 7 G (0,1), Q G J^l^, P G T^', and 
A C T^, there exist (C„,0„) for each n > no(7, g, |X|, |y|) 
such that 

1) c"(rn) G ^, for all c"(to) G e„ 

2) <^-i (m) C Tpg, (c"(m)), m G {1, 2, . . . , M} 

3) the maximum error probability is upper bounded by 2e„ 

4) the rate satisfies 

-ln|e„| > - \ngQ{A,en)-H{Q\P)-^. 
n n 

Proof of Lemma 2: The proof closely follows the proof of 
[7, Lemma 1.3, p. 101] since it essentially suffices to replace 
e and 7 in the proof of [7, Lemma L3, p. 101] with 2e„ and 
e„, respectively. We therefore omit the details here. 
One of the steps of the proof consists in showing that 



Q(TpQ](x")|a;")>l-e„ 



(100) 



for all a;" G X". To estabUsh this, one proceeds as follow. 
Given P G T?" let D denote the set of empirical conditional 
distributions W{y\x) G Vl' such that 

\P^n{a)Wib\a) - P^n{a)Q{b\a)\ > q 

for all (a, 6) G X X y. We have 

l-Q(TfQ](a.")|a.") 



< 



(101) 

y|x 

n 

^-nD(W\\Q\P) Q02) 



< {n + 1)^^™ cxp{~n minD{W\\Q\P)) (103) 



< (n + exp(-n nnn \\PW - PQf/2 ln2) 



< (n+l)l^l-l'^lexp(-ngV(21n2) 



(104) 
(105) 



(see, e.g., [7, Problem 17, p. 58]), and (105) follows from the 
definition of D. ■ 
Lemma 3 ( [7, Lemma 1.4, p. 104]): For every e,7 G 
(0, 1), if (C„, (pn) achieves an error probability e and C„ C Tp, 
then 

^ln|e„| < ^lnflQ(e„,e + 7)-if(Q|P)+7 
whenever n > no(|X|, |y |, 7). 

Since this lemma is established in [7, Lemma 1.4, p. 104J, we 
omit its proof. 

Lemma 4: For every 7 > 0, e G (0,1), Q G J^l-^, and 
^ C X" 



1 



1 



which shows (100). Inequality (102) follows from Fact 3, (103) 
follows from Fact 1, (104) follows from Pinsker's inequality 



^^\ngQ{A,e) - -lngQ{A,en)\ <7 

whenever n > no{'y,q, |X|, |y|). 

Proof of Lemma 4: By the Blowing Up Lemma [7, 
Lemma 1.5.4, p. 92] and [7, Lemma 1.5.1, p. 86], given 
the sequence {e„}„>i, there exist and {rjn} such that 
In/n and rjn ""-^^ 1, and such that the following two 
properties hold. 

For any 7 > and n > no(7, q, |X|, |y|) 

-ln|r'"S| - -InlSI < 7 for every 3 c y", (106) 

n n 

and for all a;" G X", 

Q(r'"3|a;") > r?„ whenever (3(3|a;") > e„. (107) 

Now, assuming that 23 is an e„-image of A with |!B| = 
gQ{A, e„), the relation (107) means that r'"S is an ry„ -image 
of A. Therefore we get 

^\ngQ{A,Tjn) < ^ln|r'"S| 

< 7 + - InlSI 
n 

= -y+^ln gQ{A,en) (108) 

where the second inequality follows from (106). Finally, since 
r/„ ^ 1 and e„ — > as n — > 00, for n large enough we have 

9q{A, e) < ggiA, rjn) and e„ < e, 

and therefore from (108) we get 

^\ngQ{A,e) < ^lngQ{A,en) < 7 + ^^^9Q{A,e) 

yielding the desired result. ■ 
We now use these lemmas to establish Theorem 5. Choose 
e,7 > such that e + 7 < I. Let (e„, 0„) be a coding scheme 
that achieves maximum error probability e. Without loss of 
generality, we assume that C„ C Tp (If not, group codewords 
into families of common type. The largest family of codewords 
has error probability no larger than e, and its rate is essentially 
the same as the rate of the original code C„.) Therefore 

^ In |e„| < ^ lnffQ(e„, e + 7) - H{Q\P) + 7 

< ^lngQ{en,l)-H{Q\P)+j 

< iln5Q(e„,e„)-i?(g|P)+27 (109) 
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for n > no(7, |X|, |y|), where the first and third inequalities 
follow from Lemmas 3 and 4, respectively, and where the 
second inequality follows since gQ(e„,e) is nondecreasing 
in e. On the other hand, by Lemma 2, there exists a coding 
scheme (C^, ^^), with 6^ c C„ that achieves a probabihty of 
error upper bounded by 2e„ and such that its rate satisfies 

^ln|e;| > ^ln5Q(e„,e„)-H(Q|P)-7 (HO) 

for n > no(7, g, |X|, |y|). From (109) and (110) we deduce 
the rate of is lower bounded as 

iln|e;| > -ln|e„|-37 
n n 

whenever n > no(7, Z, q', |X|, |y |). This yields the desired 
result. ■ 
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